VMware Communities
noetus
Contributor
Contributor

Workstation 11 - Not enough physical memory available error

I have installed an evaluation version of Workstation 11.1 (I have a license for Workstation 10 currently) on a Windows 8.1 machine.  Build is 2496824.

I have 7 VMS, each one very similar Windows 7 installs, 2GB allocated to each one, 32GB RAM on the host system.

I cannot run more than 2 VMS concurrently.  When I try to start the third one, I get the "Not enough physical memory available" error.

I have read up on the issue with Microsoft KB 2995388 but apparently this issue was fixed with version 10.0.4, so definitely shouldn't affect version 11.  Also I did not run into this issue with Workstation 10 when it was installed on this host, with the same guests.

Even though I am a licensed user of Workstation 10, since I am just evaluating 11 I cannot get official support.  This seems a bit odd to me as an official policy, because it means that licensed users like me who are trying out new versions and run into issues cannot get support and therefore are less likely to solve the issue and upgrade.  VMware shooting itself in the foot, really.

Anyway, since my only recourse for help is this forum, can anybody assist?  If not, I downgrade back to Workstation 10 and VMware loses the opportunity to sell me a new license....

Reply
0 Kudos
38 Replies
noetus
Contributor
Contributor

Any update on this?  These are production machines and I am currently running at 50% of capacity...

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

Please attach the logfiles from the two crashes you saw earlier.  The message suggests that your guest OS is confused and has done something which would have caused a crash or reboot on a physical machine, but I'd need to check the logfiles to see if it wasn't confused as a result of some less severe earlier problem.

Repeating one of my earlier questions: What video card(s) are you using in the two systems?

And another thing to try: Is it possible for you to temporarily downgrade the affected host to only have 16 GBytes of physical RAM installed, as a troubleshooting/diagnostic step?

Thanks,

--

Darius

Reply
0 Kudos
noetus
Contributor
Contributor

Here are the logfiles.  Strange coincidence to get the same crash on both VMs at the same time.  Suggest to me the issue is not with the guests themselves.  (Though they are similar, they are not identical, having undergone separate updates, various software installs and uninstalls, and so on, since they were both copied from the same source VM some months ago.)

Video cards on the two machines; on the problematic host it is a GeForce GTX 660 with 2GB memory.  On the Asus Rampage IV machine it is a GeForce GT 610 also with 2GB memory.

Removing some of the physical RAM is a possibility; how might this help to indicate the issue here?

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

Thanks for the logs...  They do give a reasonable timeline of the events leading up to the crash.  Both of your guest OSes were in the process of trying to BSOD as a result of what we thought was extreme memory pressure on the host.  It shouldn't have BSODed the guests, so I'll file a bug report for that... and the error message you ended up seeing was the result of a confused guest BSODing.

The best guess I have so far for the problems you're encountering would be something to do with the Memory Type Range Registers (MTRRs) on your host.  The MTRRs are configured by the firmware (hence my request to check for firmware updates), and tend to become more difficult to manage correctly on hosts with large amounts of RAM (hence my request to reduce available physical RAM to some degree) and with certain configurations of PCI/PCIe devices attached to the host (hence my query about video cards).

I'm not terribly familiar with the innards of the Windows graphics driver architecture, but it is also possible that the video driver(s) could have some responsibility for configuring the MTRRs, so a change of driver might also correlate with the problem occurring.

A failure of the host to correctly configure MTRRs could potentially restrict our ability to use RAM above 4 GBytes.

I don't know of any Windows-based tools to inspect the MTRRs, though, so I'm struggling to find ways to directly troubleshoot.

Another thing which could affect MTRRs would be host suspend/resume or hibernate/wake, so you might want to try disabling some of the more aggressive power management options in Windows too, particularly Hybrid Sleep.

But really MTRR problems is just the best guess at the moment.  You can see (from the old machine) that Workstation itself is quite capable of handling all 32 GBytes, we just need to find the reason why your new platform's firmware+OS is only showing us some small subset of the installed RAM...

Cheers,

--

Darius

Reply
0 Kudos
noetus
Contributor
Contributor

I see.  This is much food for thought.  I also updated the host's firmware when I did the OS reinstall - is it also possible that the newer firmware is screwing things up somehow?

Unfortunately it appears that downgrading the BIOS isn't officially supported by Asus.  It is possible, but it is a messy process and somewhat riskier than a regular upgrade, so at this point it is something I'd be quite reluctant to try.

However the BIOS upgrade seems the only likely candidate for changing the MTRR configuration that you speak of.  It doesn't seem to me that a simple OS re-install would do that, and nothing else has changed in the system other than that and the BIOS upgrade.

Since it is quite a new BIOS I guess it is also possible that this hasn't been reported yet as an incompatibility with VMware Workstation and large RAM installations and that particular BIOS version on that particular board.

Reply
0 Kudos
noetus
Contributor
Contributor

I tried disabling Hybrid Sleep and that hasn't made a difference.

I've also pulled 16GB of RAM from the machine so it now only has 16GB, showing 9.9GB with two guests booted up, still get the same error trying to boot up the third one.

What now?

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

What happens if you shut down the OS, let the machine power off completely, remove the mains cable for a few seconds, then plug it back in again and power it on afresh?

--

Darius

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

Also... If your priority is to get the host up and running as quickly as you can, you can try using the vmmon.disableHostParameters = "TRUE" workaround as described here: VMware KB: After installing Windows 7 SP1, VMware Workstation reports the error: Not enough physical...

That would reduce the amount of usable memory by a few GBytes compared to if the problem was solved "for real", though.  It should allow a lot more than the present limit of two concurrent VMs on that host, though.

There are two ways of checking the MTRRs: The first would be to download and boot some Linux distribution on your host.  If you are comfortable with Linux, grab something like the Debian 7.8 amd64 netinst medium, put it onto a CD or a USB key, and boot the host using that, then grab a copy of the "dmesg" output.  The second would be to download and install WinDbg and use its !mtrr command to examine the MTRRs while connected to the local kernel (see Local Kernel-Mode Debugging (Windows Debuggers))... If you end up going the WinDbg route, the output of !vm 1 and !sysinfo smbios -v could also be useful to see.

Cheers,

--

Darius

Reply
0 Kudos
noetus
Contributor
Contributor

Shutting down the OS, unplugging all power cables, and waiting for 10 minutes before switching on and rebooting again made no difference.  (Still with 16GB installed.)

Also, one of the first things I tried was the vmmon.disableHostParameters = "TRUE" work-around.  It also made no difference, and to confirm, I just tried that again.  Still the limit of only 2 machines that will boot concurrently.

I am starting to get frustrated with this.  Checking the MTRR sounds like a fair bit of work, and at this point that is just a " best guess" as to the source of the problem - and none of the things you have suggested related to this (disabling hybrid sleep, pulling memory, waiting 10 minutes between reboot, vmmon.disableHostParameters setting) has made no difference - which to my mind suggests the MTRR might be the wrong direction.

Meanwhile, I now have two bluescreens from the two VMS that were booted up after trying the last thing (config change) with the message PAGE_FAULT_IN_NONPAGED_AREA.

VMware is completely broken on my machine, and there is nothing to indicate any problem with my host at all.

Reply
0 Kudos
noetus
Contributor
Contributor

Here are the two log files for the two bluescreens I just told you about

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

I understand your frustration... This is frustrating for me too...  It's unfortunate that we have a very complex piece of software that is subject to many external factors from host hardware, firmware, drivers, OS, patches, and other third-party software and drivers.  It's really a bit of a nightmare to figure things out sometimes.

Can you provide the vmware.log from the failure to power on with vmmon.disableHostParameters = "TRUE"?  It is very interesting that it's failed that way too.

Cheers,

--

Darius

Reply
0 Kudos
noetus
Contributor
Contributor

Setting vmmon.disableHostParameters = "TRUE" seems to have caused some massive issues with the Windows OS.

First, I mentioned I got two blue screens on the two running VMs the first time I tried it.  I already posted the logs for those two a few messages up, so that gives one indication of how things went with that setting.

I tried again just now, this time rebooting after setting vmmon.disableHostParameters = "TRUE" in the config file.  After rebooting and starting Worksation as Administrator this time managed to get 4 VMs running briefly before massive problems set in that culminated in the host system crashing (freezing).  First two of the four VMs crashed shortly after boot with an error I hadn't seen before - having to do with memory allocation.

Then the host system started becoming unresponsive and I started getting memory allocation errors from Windows Explorer and the two web browsers that I had open.  The system became sluggish and unresponsive to mouse clicks, and shortly thereafter froze completely.

I performed a power cycle and rebooted.  System seems stable once again (Workstation is not running now).  I've attached the logs for all four VMs in that last experiment; you can see that two of them crashed with memory allocation errors, and the other two suffered some sort of issues as well.

There is a 9 hour time difference between us so you won't see any more responses from me until tomorrow now.

Reply
0 Kudos
noetus
Contributor
Contributor

Forgot to attach the logs.  Here they are.

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

The logs you attached are suggesting more and more that your host has a problem either with MTRRs or with its IOMMU (VT-d) configuration, both of which are configured partly by the system firmware and partly by the operating system, and the configuration problem is restricting the availability of memory above 4 GBytes for driver use, as evidenced by Workstation's inability to see more than about 3.7 GBytes of RAM, and as evidenced by the starvation of lockable memory when Workstation is told to go ahead and try to use more of your system's RAM (the vmmon.disableHostParameters = "TRUE" workaround).

It might also help to inspect the host's Windows event logs to see if there are any platform errors or misconfigurations being reported as the OS is booting.

At this point, it sounds like you will need to roll back the firmware on the host (or contact Asus for technical support) if you want Workstation to run correctly.

Thanks,

--

Darius

Reply
0 Kudos
noetus
Contributor
Contributor

OK, thanks for all this info.


Do you think it is possibly a hardware fault of some sort?  (I did run the Windows Memory test on the full 32GB, which passed with no errors)  I ask because I originally did the OS reinstall because I was having issues with Windows before.  Specifically with Windows Explorer; it would become unresponsive after a few minutes (e.g. no right-click on taskbar) and I would have to restart it although general applications were working fine.  This kept going for a while and then finally there were other problems, specifically the entire system was starting to become unresponsive.


I suspected either (a) Windows file corruption; (b) Conflict between Explorer and some explorer enhancements I had installed, or (c) maybe a virus infection of some sort.  Rather than investigate the problem I decided to simply reinstall Windows, as I had been planning to do that anyway for other reasons.

The reinstall seemed to be working perfectly until this happened.  Now I suspect a hardware fault that could explain the issues I was having before (pre new system board firmware update) and these issues with Workstation.

I could try reinstalling Windows again on the new system board firmware to see if that clears it up before investigating further along the lines you suggest.

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

I doubt it is a problem with the physical RAM installed in the host.

It seems more like a problem between the CPU, chipset and IOMMU.  Looking only at the symptoms we're seeing now, I'd say there is a low probability that it is a hardware fault specific to your specific unit (i.e. defective silicon chips) and a somewhat higher probability that the silicon is functioning as designed but it's being configured incorrectly by the firmware.  I can't draw any conclusions from the described system behavior prior to the firmware update.

One other thought: Did you clear/reset your firmware configuration after installing the firmware update?  If not, it may be worthwhile doing so... See Section 2.2.7 of the P9X79 PRO manual to clear the CMOS, and then go into firmware setup and press <F5> to load optimized defaults.

It's also worthwhile checking your Windows event logs for the system boot -- I mentioned this in the previous post but you did not mention having followed through on it.  Sometimes Windows will log messages about problems it sees with the way the firmware has initialized the system.  The absence of such messages does not mean that there are no problems, but certainly a message might indicate a problem or something to further investigate.

Cheers,

--

Darius

Reply
0 Kudos
noetus
Contributor
Contributor

Resetting the CMOS on the system board did not resolve the issue.

Windows logs don't reveal anything useful at this stage.

At this point I am faced with seeing if reinstalling Windows will help (by resetting the memory configuration you speak of) or simply upgrading to Workstation 11 and opening a support ticket (I currently don't have access to support but by upgrading I would) to try and resolve the issue.  Reinstalling Windows will be a big time suck and may not help, so perhaps I should just go the second route.

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

My Windows kernel/driver expert here has suggested that there's a chance that this could result from an out-of-date or incompatible driver for some piece of hardware in your system.  Perhaps double-check that you've installed all of the latest drivers for your hardware, particularly the chipset drivers, and remove from the system (or disable in BIOS setup) any peripherals or devices that you're not using.

Beyond that, I can't think of anything further to try besides reinstalling Windows or downgrading the firmware.

Cheers,

--

Darius

noetus
Contributor
Contributor

Reinstalling Windows seems to have done the trick.  All VMs can now be started together.

Reply
0 Kudos