sueii
Contributor
Contributor

Strange intermittent network issue

I suspect this may be a Windows 10 issue rather than VMWare, but thought I would ask here in case anyone has encountered the same problem ...

I've been using VMWare Workstation Player on my Windows 10 pro desktop PC for over a year with no problem. There have been no recent changes other than the usual Windows updates. I use Putty to connect to my VMs. Suddenly I find that Putty can't connect, the connection just times out. This affects multiple VMs. The VMs can't ping each other, which they can normally do, but they can ping Internet addresses.  I reboot my PC and the problem has gone, my network is back to normal. The next time I start the PC the issue is back, another reboot, and all is fine again. 

I've tried restarting VMs. I've tried stopping and restarting VMWare services. 

Grateful for any suggestions as to what I could try next. It's annoying to keep having to reboot. 

Sue

 

0 Kudos
20 Replies
RaSystemlord
Expert
Expert

I haven't seen this, but I think I have a direct suggestion. 

With VMs, I gather you mean VMs in a NAT network. Indeed, they should find each other and the Host should find them. I also assume that you "ping hostname". I also think that if you would "ping ip-address", it would work ... which is a workaround, but not so nice.


If the above is true, then I think Windows 10 internal DNS-functionality is unstable. You can overcome this by editing "hosts" file to include name-ip - resolution ("C:\Windows\System32\drivers\etc" is the folder). A reboot is required after that change (for ping not, but for many other functionality, it IS required). The editing needs to happen on the Host. If VMs need to see each other (in the same NAT networking), then in addition you need to edit "hosts" -file on each VM.

When VMs are copies, this kind of problem will arise - just saying, this isn't your case based on the description.

So, if the above solves it, this isn't a VMware specific thing, but a Windows 10 problem and the solution is generic networking functionality when operating in a NAT network.

0 Kudos
sueii
Contributor
Contributor

Thank you for that suggestion.  Sadly I've been using ip addresses (I should have mentioned that) and still get the same problem. 

Tags (1)
0 Kudos
RaSystemlord
Expert
Expert

Well, if you use ip-address (and it is correct) and a reboot of the Host solves the problem ... this feels like Windows 10 networking instability. 

Not really consistent with this, but you may have some anti-virus software, which works as firewalling the system. Not sure, why its behavior changes when you reboot the Host. If not - well, one of those Windows 10 problems, I gather. There are regressions in Windows .dll's.

If it's a Windows Update (or rather a Feature Upgrade problem), a reinstall of VMware might help ... although, usually in this case an application does not work at all, before a reinstall.

0 Kudos
sueii
Contributor
Contributor

That was a good thought, about the anti-virus. So I turned my firewall of, just for a minute, to test.  Nope. Still no connection. 

Re-installing VMware seems like a sledgehammer to crack a nut. Also, I agree it feels more like Win10 issue than VMware. I will keep that as a last resort. 

0 Kudos
RaSystemlord
Expert
Expert

"Re-installing VMware seems like a sledgehammer to crack a nut."

Yes, prior to Win 10 this wasn't a likely candidate for anything. Now the "continuous updating model" has got it to be a relevant consideration with Windows Updates and mostly Upgrades. No wonder, Windows 11 doesn't have that updating model - or that is the assumption in some Media.

But, as mentioned already, typically an application stops working because of bad or missing dll's. You might want to study logs (in 'temp's) and Windows Event Viewer to find out what actually fails - sometimes that solves the case or at least makes the case clearer.

It is also possible that any application installation can compromise any other application or Windows functionality. Uninstall does not necessarily help, because that is written by the same group of people, who wrote the wrong install & application program. However, this falls into the same "bad or missing dll" -consideration, where a reboot would not cure the problem. Also, you are not reporting on any application installs and uninstalls having some effect on this.

EDIT: Actually turning off firewalling might not be enough. For some changes to really happen, you need to reboot - Windows just doesn't ask for a reboot in many cases, but it still needs it (or running "something", which might be difficult to comprehend what that "something" in Windows is).

0 Kudos
sueii
Contributor
Contributor

I've been working away on this problem.  Its still happening. 

I did re-install VMware also took the opportunity to upgrade v15 to v16.  Made no difference. But it was useful to try that.

The only change I had made recently, that could potentially affect the network was to install Tunnel Bear.  So de-installed that. Made no difference.

I have tried comparing network settings,via ipconfig, from when the connections are working to when not working. But I can see no differences. Also looked at Vmware services and Windows network-related services. All look the same when working or not working. 

I'm running out of ideas now 😞

 

0 Kudos
RaSystemlord
Expert
Expert

As for a new idea: At some point I was struggling to get a static ip to work in NAT. The solution was to enable DHCP. I mean, the requirement to have DHCP active, makes no sense when using a static address, but that was required.

So, why would that all of sudden change in your system - well, no idea, but I guess some update to the system might trigger this strange behavior.

0 Kudos
sueii
Contributor
Contributor

Cheers, that's given me something else to try

0 Kudos
sueii
Contributor
Contributor

Hi all,  remember this issue I raised last year?  Its still happening.  The DCHP idea didn't work.  But thanks for the suggestion RaSystemlord

I came up with another clue though.  I've pinned the issue down to shutdown/startup of the host PC (Windows 10 pro) versus restart.

On startup, I rarely, if ever, get a connection between guest and host.  After a restart, I pretty much always do get a connection.

So what's the difference?  From some Google searches ...

"A shutdown in Windows 8 or 10 is not really a shut down, the state of RAM is written to the hard disk and then read back into RAM when you start your system again.

Whereas a Restart does actually clear RAM and does not read the state of Windows. That is why in Windows 8 and 10 a restart takes a lot longer than turning off and back on your PC"

How affects VMWare, I don't know. It's quite annoying to have to restart before I can use my VM images, but at least I can use them. 

If this gives anyone an idea for a permanent fix, do please let me know.

Cheers all

Sue

 

0 Kudos
RaSystemlord
Expert
Expert

@sueii  Thanks for the head up!

However, I don't think that the Internet explanation is correct. Shutdown is shutdown. On a physical computer it is better than Restart, because it will turn hardware off (could be necessary in some cases, even emtifying capacitors by pressing start when power cord is off is sometimes, rarely, required).

There is a difference between shutdown vs sleep-standby-suspend (or whatever it is called) vs hibernate. 

In some versions of Windows, there is a mess with these concepts in the user interface ... it is possible that a button reading shutdown doesn't really do that. In that case, you need to look into the GUI further and find the real shutdown. 

There is no difference in the loading time after Restart or Shutdown - in the normal case of Windows 10 and VM. However, there is a big difference if you use sleep. Hibernate is good if you need to shut off completely the physical system and continue later, EXACTLY where you left off. (Used that on a physical system with VM running Oracle&stuff in a live process, with Win7 - works without problems). Hibernate does not consume any power.

When copying or moving VMs from one system to another, which works very well between different OS's and different hardware as well, you need to find the Real Shutdown, in order to avoid problems in starting the system again in the other system (with different OS and completely different hardware). If you do, no problems (except the possible NTFS filesystem problem for the location of the new VM on Linux - use a Linux specific FS - not sure if later Linux kernel addresses that already).

0 Kudos
sueii
Contributor
Contributor

Thanks for that explanation. Indeed, I've always thought that shutdown would be more thorough than restart.  I definitely do full shutdowns including power off and switch off at the PC's plug. Not hibernate.

The fact remains that after half a year of having this issue, I can say with some confidence that shutdown (with power off) and startup causes the VM startup fine, but the host can't connect to it i.e I can log in via the VMware image console, but can't run a putty session with ssh to the VM. Or if I have a web server running on the VM, can't connect to it with the host browser. 

Whereas, after a restart (and no power off) I can get a putty connection. I can then, reliably, get multiple ssh or http connections. 

I can only guess that there is some corruption creeping in, either in VMWare services or PC's network services, that gets cleared by a reset but not by a shutdown.  At least that was my guess after reading the Google post, not so much now. 

If not corruption, then maybe the order of services starting is the issue, perhaps some services start up more easily/quickly after a reset and prevent some kind of conflict. 

Or maybe it's a hardware issue with some residual power to a component being retained throughout reset and not after power off.

Not sure what I can do to diagnose further or resolve any of those things. Any ideas are most welcome. 

On the plus side, now I've found that reset gives me good connections, it's much less of a problem. 

Thanks very much for taking an interest RaSystemlord.

0 Kudos
RaSystemlord
Expert
Expert

Thanks @sueii !

Your consideration how some component can become available after Restart and not necessarily after Shutdown, is very valid. I just recently had that with an older workstation with the network adapter. However, that was with physical computer, were electronics are in play.

The thing that why VMware inherits similar behavior, is hard to imagine. Or is the matter about how Windows starts VMware-specific drivers? I have no internal information on that issue, but there was once a discussion that some of that perhaps takes place in RAM. Are you perhaps on the edge of sufficient RAM allocation within the VM (or running out of memory on the physical computer)? That might explain delays in loading drivers. Still, I have no theory on why behavior is different in Restart than in Shutdown.

0 Kudos
sueii
Contributor
Contributor

Oh yes, definitely I'm using lots of RAM in my VMs. Plus the Google search that I mentioned earlier (I know it's the Internet so take with a pinch of salt) did mention clearing memory as a difference between restart and shutdown/startup. So you could have something there.  Not sure what I can do about it though, except buy a better spec PC maybe. 

0 Kudos
RaSystemlord
Expert
Expert

@sueii 

Not sure, but electronics don't clear up memory in Restart, because electronics don't power down. However, that doesn't tell anything about the OS using any of the non-cleared memory in RAM. It doesn't make any sense that Reboot would use non-cleared memory - for me, that is. This may also depend on whether you have a memory check activated in BIOS and whether it does it, in the case of Reboot ... in cold Restart, it obviously always does.

What does VMware virtual hardware do in this case - no clear idea why it would be different to physical hardware. Maybe some developer can clarify this? Since Windows 10 VM shutdown, still, after a long time, locks up Linux hosts, from time to time AND Win11/Linux VM NEVER does that, there might be something funny going on in Windows 10 Shutdown OR its implementation in VMware. Just speculating.

Your case was about a Windows VM reboot, not the physical reboot.

As for what can be done for RAM, well, not sure if this relates to this problem or not, but since you asked ...:

- obviously, I have no idea in what kind of memory allocation we are now. For large datasets 128 GB may be required - for moderate use, 16 GB is plenty.

- you can add memory. Or if its a "market PC", all the slots are filled and you have to substitute everything with bigger RAMs. Relatively speaking, they don't cost that much. Decent memory for a modern workstation motherboard, is about 80 euros per 16 GB. I would expect the same in dollars. There is no reason to buy a newer PC because of lack of memory. Check what your max is - 10 years ago 16 GB - later 32 GB - nowadays 64/128 GB, but check to make sure.

- as for memory config, do the following:

   a) make sure that you NEVER run out of memory on the physical computer. If you do, everything will lock up and getting rid of that situation may take some time or take a hard reset of the entire system. Count the memory allocation of your running VMs and add something for the Host OS (like 4-6 GB for Win10, depending largely what you do there, if anything). Observe the swap (in Task Manager) use to make sure. 

  b) check that the memory allocation of each VM that you use, is sufficient. Use Task Manager to find out. Do NOT exceed the physical memory of your physical computer.

 c) run less VMs at the same time, if not possible to commit the changes mentioned above

 d) try to run less applications at the same time. Internet browsers can steal lots of memory. Run them on a different computer. Observe all the other applications and their memory consumption. For instance, database applications, like Oracle, can allocate lots of memory, which they don't actually need for their operations (highly dependent on what you do).

0 Kudos
sueii
Contributor
Contributor

Thanks for sticking with me,  some comments below ...

 

Not sure, but electronics don't clear up memory in Restart, because electronics don't power down. However, that doesn't tell anything about the OS using any of the non-cleared memory in RAM. It doesn't make any sense that Reboot would use non-cleared memory - for me, that is. This may also depend on whether you have a memory check activated in BIOS and whether it does it, in the case of Reboot ... in cold Restart, it obviously always does. Doesn't make sense to me either, just saying what I see. 

What does VMware virtual hardware do in this case - no clear idea why it would be different to physical hardware. Maybe some developer can clarify this? Since Windows 10 VM shutdown, still, after a long time, locks up Linux hosts, from time to time AND Win11/Linux VM NEVER does that, there might be something funny going on in Windows 10 Shutdown OR its implementation in VMware. Just speculating.

Your case was about a Windows VM reboot, not the physical reboot.  The host is Windows, the guest VMs are Linux.

As for what can be done for RAM, well, not sure if this relates to this problem or not, but since you asked ...:

- obviously, I have no idea in what kind of memory allocation we are now. For large datasets 128 GB may be required - for moderate use, 16 GB is plenty.  128Gb  Ha!  I wish.  No, it's nowhere near that.  Host has 32Gb.  VMs mostly around 8Gb.  I tend to run two or three at at time.  But have tested with just one and same issue. So I guess we can rule out using too much memory.

- you can add memory. Or if its a "market PC", all the slots are filled and you have to substitute everything with bigger RAMs. Relatively speaking, they don't cost that much. Decent memory for a modern workstation motherboard, is about 80 euros per 16 GB. I would expect the same in dollars. There is no reason to buy a newer PC because of lack of memory. Check what your max is - 10 years ago 16 GB - later 32 GB - nowadays 64/128 GB, but check to make sure.

- as for memory config, do the following:

   a) make sure that you NEVER run out of memory on the physical computer. If you do, everything will lock up and getting rid of that situation may take some time or take a hard reset of the entire system. Count the memory allocation of your running VMs and add something for the Host OS (like 4-6 GB for Win10, depending largely what you do there, if anything). Observe the swap (in Task Manager) use to make sure. 

  b) check that the memory allocation of each VM that you use, is sufficient. Use Task Manager to find out. Do NOT exceed the physical memory of your physical computer.

 c) run less VMs at the same time, if not possible to commit the changes mentioned above

 d) try to run less applications at the same time. Internet browsers can steal lots of memory. Run them on a different computer. Observe all the other applications and their memory consumption. For instance, database applications, like Oracle, can allocate lots of memory, which they don't actually need for their operations (highly dependent on what you do).

Thanks for the advice. You are totally right of course, but I have the PC I have and I want to run the VMs that I have. So long as restart keeps getting me connections, I guess I can cope. 🙂

0 Kudos
RaSystemlord
Expert
Expert

Well, 32 GB with 2-3 VMs should be quite alright, in any kind of use, which can be considered normal.

Sorry, I forgot your config - Windows Host and Linux guests. That opens up a new thing to consider and perhaps test ... I hope this hasn't been brought up before ...

... in the old days, Linux computers couldn't always get the ip-address from regular (cheap, home) routers. Maybe that particular Linux, with its particular VMware Tools, cannot get DHCP-address from the Host. 

The cure is easy, use a fixed ip-address on the Linux. Since you have many guests running at the same time, perhaps you would like to have static addresses between them anyway (if they interact with each other). Obviously, your fixed ip needs to be outside the range of DHCP server (on the Windows Host, in VMware).

Why Restart works then, still no idea, but I don't know the internals that well.

0 Kudos
sueii
Contributor
Contributor

I'm already using static IP addresses on the VMs.  Having said that, the VMs can talk to each other regardless of startup or restart, it's the host they won't talk to and the host does use dhcp, Sky broadband. I don't think I can do much about that. Also, like you, I don't see why a restart would make any difference to that.

I'm resigned to doing a restart before anything that involves host/guest communication. Thanks one more time for your help and suggestions.

Cheers

Sue

0 Kudos
RaSystemlord
Expert
Expert

@sueii 

Yes, my answer may have been driving you nuts, because I haven't remembered what already has been established much earlier. Well, this is a long thread.

The thing that VM's can discuss with each other, is new information. There is nothing wrong in the network of the VMs. The problem is with Windows or rather its configuration or how it works with VMware. You have said that you have tried a reinstall of VMware without any results. I hope that reinstall has been a relevant one - after possible other changes and especially after any Win 10 Feature Update. Somebody had strange problems when VMware wasn't installed in a default location, but somewhere else. Typically, for true Windows software, installing somewhere else makes no sense, since most of the files will be installed on c: under some windows-directories anyhow.

So, if the above was "old news", you might consider studying Windows further. It seems that something is just blocking the networking on the Host, towards VMware networks. Do you have any 3rd party software running, which might do it? Some virus scanners do more than just blocking viruses, they may have firewalling capabilities. You say that you turned Windows Firewall off to see - how? ... perhaps depending on the Windows version, but that isn't enough - you need to reboot after that change ... there are many cases in Win 10 when it doesn't request a reboot, but still needs it.

As for your Host being with DHCP on a Broadband - that makes no difference ... that is "kind of" basically always the situation. Well, infrastructures vary - perhaps you mean that you are connected to your own router with DHCP in your own subnet, and the router is connected to a Broadband service provider with DHCP or not. Either way, that's all good for this discussion.

0 Kudos
sueii
Contributor
Contributor

Thanks for all your help RaSystemlord,  you haven't been driving me nuts, I have been grateful for your suggestions.

I finally have a solution or a workaround at least. I had found that after a restart of the Windows 10 host all would be well. But when starting up from power off I didn't get VM connections.  Someone suggested that I could try turning off Fastboot on my host. I did that and I've had no problems since then. 

I thought I would post the solution in case anyone else comes across similar issues.  And thanks again for your help.

0 Kudos