VMware Communities
lawill5
Contributor
Contributor
Jump to solution

VMware workstation 7.0 with tools freezes and is unreachable intermittently. Then recovers as if nothing happened

        1. Given info:

Host Server: HP 585g5 series with 2 x Quad Core AMD procs and 32GB of RAM.

Host OS: RHEL5U4 64bit with no other applications installed except vmware workstation 7.0

VMware Guest storage device: We use a NAS (5 X 1TB SATA drives) on the same subnet as the host server which runs Jumbo Frames. (Since adding the NAS, we've migrated the VM's from local /u01 to nfs mounted IP:/vmimages using Hard mounts and wsize & rsize =32768.

VMware Guest: 2 X OEL4U7 & 2 X RHEL4U8 64bit (All have perisitent disk and either 5GB or 8GB of RAM) when building each VM we select both procs and 2cores per proc, giving each machine 4procs. All machines are bridged with their own IP AddressWe don't necessarily run all VM's at the same time, but we do run a pair of them for development & test. we do have vmware tools installed on all of our VM's. We're on a priv network unattached from the internet so there's pretty much no chance I'd be allowed to get you logs.

    1. Unknown Issue

While running our VM, we'll do normal work (building an Oracle Database) and then out of no where the screen freezes. When this happens, it can be 30sec, up to 5mins or so before the screen releases itself. developers discovered the issue b/c they would get kicked off the server (ssh); and then it started happening frequently on the console as well. I have a monitor server that I use to ping the machine while it locks up, and when I ping the VM's ip address that is freezing up, it will return host unreachable, then all of a sudden the pings work, and life returns. I don't have any snapshot settings or anything like that set up.

At first looking through some discussions, we thought it was our NFS mounting issue or the time sync issue b/n Host and VM's, we've fine tuned those now, but to no avail.

-- One of our co-workers thinks it might have something to do with utilizing multiple processors, instead he reco's using one proc per VM and one core..

Question: from what I've read, in the VMware best practices, we should be good, but not sure why we're freezing up all the time. Please help.

0 Kudos
1 Solution

Accepted Solutions
continuum
Immortal
Immortal
Jump to solution

Hi

if you opened a suport case then I can no longer help.

I do not work for VMware and do not have access to those areas




_________________________

VMX-parameters- WS FAQ -[ MOAcd|http://sanbarrow.com/moa241.html] - VMDK-Handbook


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

View solution in original post

0 Kudos
8 Replies
lawill5
Contributor
Contributor
Jump to solution

Additional Info:

We were able to reset all VM's currently running (2) to 1 processor and using only 1core per processor. We still had a freeze today.

0 Kudos
continuum
Immortal
Immortal
Jump to solution

can you post the vmware.log next time such a freeze happens ?




_________________________

VMX-parameters- WS FAQ -[ MOAcd|http://sanbarrow.com/moa241.html] - VMDK-Handbook

You also find me in the support crew of PHD Virtual Backup


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
lawill5
Contributor
Contributor
Jump to solution

OK, here is the last 30mins of the vmware.log

@ 18:10:51 the server froze: This time I was sitting right there and noticed that the host clock kept ticking, then the machine unfroze and then the guest was racing trying to catch up to the host's clock. Amazing, didn't know the machine could do that. It immediately slowed down to normal time when they were in synce.

As I look over the past 30 mins, I'm wondering what the following statement means.

Aug 24 18:10:38.508: vmx| scsi0:0: Command WRITE(10) took 1.725 seconds (ok)

Aug 24 18:11:52.098: vmx| ide1:0: Command TEST UNIT READY took 60.006 seconds (ok)

            1. Below is the last 30mins..

Aug 24 17:29:49.932: mks| MKS lost grab

Aug 24 17:31:28.799: vmx| ide1:0: Command TEST UNIT READY took 60.054 seconds (ok)

Aug 24 17:31:28.800: mks| MKS lost grab

Aug 24 17:32:29.853: vcpu-0| DISKLIB-LIB : numIOs = 19000000 numMergedIOs = 384315 numSplitIOs = 35794

Aug 24 17:33:47.342: vcpu-0| DISKLIB-LIB : numIOs = 19050000 numMergedIOs = 384866 numSplitIOs = 35860

Aug 24 17:35:23.914: vmx| ide1:0: Command TEST UNIT READY took 59.772 seconds (ok)

Aug 24 17:35:44.739: vcpu-0| DISKLIB-LIB : numIOs = 19100000 numMergedIOs = 385802 numSplitIOs = 35982

Aug 24 17:36:07.733: vcpu-0| DISKLIB-LIB : numIOs = 19150000 numMergedIOs = 386195 numSplitIOs = 36058

Aug 24 17:36:29.717: vcpu-0| DISKLIB-LIB : numIOs = 19200000 numMergedIOs = 386741 numSplitIOs = 36142

Aug 24 17:36:56.838: vcpu-0| DISKLIB-LIB : numIOs = 19250000 numMergedIOs = 387980 numSplitIOs = 36219

Aug 24 17:37:41.379: vcpu-0| DISKLIB-LIB : numIOs = 19300000 numMergedIOs = 388388 numSplitIOs = 36285

Aug 24 17:39:36.439: vmx| DISKLIB-LIB : numIOs = 19350000 numMergedIOs = 389290 numSplitIOs = 36370

Aug 24 17:40:47.480: vcpu-0| DISKLIB-LIB : numIOs = 19400000 numMergedIOs = 391633 numSplitIOs = 36548

Aug 24 17:41:13.216: vmx| scsi0:0: Command WRITE(10) took 1.756 seconds (ok)

Aug 24 17:41:13.216: vmx| scsi0:0: Command WRITE(10) took 1.275 seconds (ok)

Aug 24 17:41:13.216: vmx| scsi0:0: Command WRITE(10) took 1.275 seconds (ok)

Aug 24 17:41:13.216: vmx| scsi0:0: Command WRITE(10) took 1.275 seconds (ok)

Aug 24 17:41:13.216: vmx| scsi0:0: Command WRITE(10) took 1.275 seconds (ok)

Aug 24 17:41:13.216: vmx| scsi0:0: Command WRITE(10) took 1.274 seconds (ok)

Aug 24 17:41:13.216: vmx| scsi0:0: Command WRITE(10) took 1.274 seconds (ok)

Aug 24 17:41:13.216: vmx| scsi0:0: Command WRITE(10) took 1.274 seconds (ok)

Aug 24 17:41:13.216: vmx| scsi0:0: Command WRITE(10) took 1.274 seconds (ok)

Aug 24 17:41:13.216: vmx| scsi0:0: Command WRITE(10) took 1.274 seconds (ok)

Aug 24 17:41:48.830: vmx| DISKLIB-LIB : numIOs = 19450000 numMergedIOs = 393355 numSplitIOs = 36723

Aug 24 17:42:33.617: vcpu-0| DISKLIB-LIB : numIOs = 19500000 numMergedIOs = 396464 numSplitIOs = 36817

Aug 24 17:43:57.987: vcpu-0| DISKLIB-LIB : numIOs = 19550000 numMergedIOs = 397044 numSplitIOs = 36874

Aug 24 17:45:08.972: vcpu-0| DISKLIB-LIB : numIOs = 19600000 numMergedIOs = 397978 numSplitIOs = 37001

Aug 24 17:45:55.015: vcpu-0| DISKLIB-LIB : numIOs = 19650000 numMergedIOs = 398362 numSplitIOs = 37084

Aug 24 17:46:36.536: vcpu-0| DISKLIB-LIB : numIOs = 19700000 numMergedIOs = 398639 numSplitIOs = 37118

Aug 24 17:47:27.947: vmx| DISKLIB-LIB : numIOs = 19750000 numMergedIOs = 400180 numSplitIOs = 37252

Aug 24 17:48:14.320: vcpu-0| DISKLIB-LIB : numIOs = 19800000 numMergedIOs = 400626 numSplitIOs = 37309

Aug 24 17:49:24.054: vmx| DISKLIB-LIB : numIOs = 19850000 numMergedIOs = 401513 numSplitIOs = 37390

Aug 24 17:50:41.407: vcpu-0| DISKLIB-LIB : numIOs = 19900000 numMergedIOs = 403905 numSplitIOs = 37580

Aug 24 17:51:44.347: vcpu-0| DISKLIB-LIB : numIOs = 19950000 numMergedIOs = 405753 numSplitIOs = 37783

Aug 24 17:52:24.658: vmx| DISKLIB-LIB : numIOs = 20000000 numMergedIOs = 408672 numSplitIOs = 37809

Aug 24 17:53:13.537: mks| MKS lost grab

Aug 24 17:53:15.022: mks| MKS lost grab

Aug 24 17:53:17.414: mks| MKS lost grab

Aug 24 17:53:50.381: vcpu-0| DISKLIB-LIB : numIOs = 20050000 numMergedIOs = 409479 numSplitIOs = 37882

Aug 24 17:54:45.144: mks| MKS lost grab

Aug 24 17:54:54.743: vcpu-0| DISKLIB-LIB : numIOs = 20100000 numMergedIOs = 410393 numSplitIOs = 37998

Aug 24 17:55:36.872: vcpu-0| DISKLIB-LIB : numIOs = 20150000 numMergedIOs = 410781 numSplitIOs = 38088

Aug 24 17:56:19.929: vmx| DISKLIB-LIB : numIOs = 20200000 numMergedIOs = 411138 numSplitIOs = 38157

Aug 24 17:57:21.335: vcpu-0| DISKLIB-LIB : numIOs = 20250000 numMergedIOs = 412536 numSplitIOs = 38255

Aug 24 17:58:24.590: vcpu-0| DISKLIB-LIB : numIOs = 20300000 numMergedIOs = 413158 numSplitIOs = 38378

Aug 24 17:59:29.810: vcpu-0| DISKLIB-LIB : numIOs = 20350000 numMergedIOs = 414181 numSplitIOs = 38423

Aug 24 18:01:10.124: vmx| DISKLIB-LIB : numIOs = 20400000 numMergedIOs = 416935 numSplitIOs = 38667

Aug 24 18:01:51.144: vmx| DISKLIB-LIB : numIOs = 20450000 numMergedIOs = 419011 numSplitIOs = 38843

Aug 24 18:02:35.870: vcpu-0| DISKLIB-LIB : numIOs = 20500000 numMergedIOs = 422269 numSplitIOs = 38904

Aug 24 18:03:35.388: vmx| DISKLIB-LIB : numIOs = 20550000 numMergedIOs = 422839 numSplitIOs = 38975

Aug 24 18:04:29.175: vmx| DISKLIB-LIB : numIOs = 20600000 numMergedIOs = 423679 numSplitIOs = 39076

Aug 24 18:05:02.450: vcpu-0| DISKLIB-LIB : numIOs = 20650000 numMergedIOs = 424068 numSplitIOs = 39156

Aug 24 18:05:38.819: vmx| DISKLIB-LIB : numIOs = 20700000 numMergedIOs = 424979 numSplitIOs = 39268

Aug 24 18:07:16.698: vmx| ide1:0: Command TEST UNIT READY took 58.563 seconds (ok)

Aug 24 18:07:23.012: vcpu-0| DISKLIB-LIB : numIOs = 20750000 numMergedIOs = 425855 numSplitIOs = 39319

Aug 24 18:07:52.640: vcpu-0| DISKLIB-LIB : numIOs = 20800000 numMergedIOs = 426429 numSplitIOs = 39434

Aug 24 18:08:31.017: vmx| DISKLIB-LIB : numIOs = 20850000 numMergedIOs = 427291 numSplitIOs = 39485

Aug 24 18:08:59.354: vcpu-0| DISKLIB-LIB : numIOs = 20900000 numMergedIOs = 429600 numSplitIOs = 39665

Aug 24 18:09:28.428: mks| MKS lost grab

Aug 24 18:09:29.207: mks| MKS lost grab

Aug 24 18:09:38.035: mks| MKS lost grab

Aug 24 18:09:38.798: mks| MKS lost grab

Aug 24 18:09:42.618: mks| MKS lost grab

Aug 24 18:10:28.022: vcpu-0| DISKLIB-LIB : numIOs = 20950000 numMergedIOs = 431043 numSplitIOs = 39709

Aug 24 18:10:38.497: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.497: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.497: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.497: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.723 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.720 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.719 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.720 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.720 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.725 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.724 seconds (ok)

Aug 24 18:10:38.507: vmx| scsi0:0: Command WRITE(10) took 1.720 seconds (ok)

Aug 24 18:10:38.508: vmx| scsi0:0: Command WRITE(10) took 1.725 seconds (ok)

Aug 24 18:10:38.508: vmx| scsi0:0: Command WRITE(10) took 1.720 seconds (ok)

Aug 24 18:10:38.508: vmx| scsi0:0: Command WRITE(10) took 1.720 seconds (ok)

Aug 24 18:10:38.508: vmx| scsi0:0: Command WRITE(10) took 1.720 seconds (ok)

Aug 24 18:10:38.508: vmx| scsi0:0: Command WRITE(10) took 1.725 seconds (ok)

Aug 24 18:10:38.508: vmx| scsi0:0: Command WRITE(10) took 1.720 seconds (ok)

Aug 24 18:10:38.508: vmx| scsi0:0: Command WRITE(10) took 1.720 seconds (ok)

Aug 24 18:10:38.508: vmx| scsi0:0: Command WRITE(10) took 1.725 seconds (ok)

Aug 24 18:10:38.508: vmx| scsi0:0: Command WRITE(10) took 1.725 seconds (ok)

Aug 24 18:11:52.098: vmx| ide1:0: Command TEST UNIT READY took 60.006 seconds (ok)

Aug 24 18:11:52.098: mks| MKS lost grab

Aug 24 18:11:52.565: mks| MKS lost grab

Aug 24 18:11:53.445: vmx| DISKLIB-LIB : numIOs = 21000000 numMergedIOs = 431820 numSplitIOs = 39864

Aug 24 18:11:56.005: mks| MKS lost grab

Aug 24 18:12:04.633: mks| MKS lost grab

Aug 24 18:12:30.912: mks| MKS lost grab

Aug 24 18:12:39.249: mks| MKS lost grab

Aug 24 18:12:41.653: vcpu-0| DISKLIB-LIB : numIOs = 21050000 numMergedIOs = 432357 numSplitIOs = 39929

Aug 24 18:13:09.904: vcpu-0| DISKLIB-LIB : numIOs = 21100000 numMergedIOs = 433254 numSplitIOs = 40040

Aug 24 18:13:28.151: vmx| DISKLIB-LIB : numIOs = 21150000 numMergedIOs = 433649 numSplitIOs = 40118

0 Kudos
continuum
Immortal
Immortal
Jump to solution

please please attach the full log - makes it much easier to read and I need additional info about your vmdks - so please no cuts next time Smiley Wink

first impression: are your disks healthy ? - I mean the physical ones ?

or do you use a growing type piece of disks and have not shrinked it in months ?




_________________________

VMX-parameters- WS FAQ -[ MOAcd|http://sanbarrow.com/moa241.html] - VMDK-Handbook

You also find me in the support crew of PHD Virtual Backup


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
lawill5
Contributor
Contributor
Jump to solution

Hey, Since we last wrote each other.

  • Created Service Request 1571537491 (paid per incident support)

  • Upgraded to 7.1.1

  • Throttled Procs to 1proc and 2cores/proc

  • Set to use Host RAM only and not use SWAP

  • Got permission to dirty word scrub and upload "Collect Support Data"

Not sure if you have access to help out now, but we'll see.

0 Kudos
continuum
Immortal
Immortal
Jump to solution

Hi

if you opened a suport case then I can no longer help.

I do not work for VMware and do not have access to those areas




_________________________

VMX-parameters- WS FAQ -[ MOAcd|http://sanbarrow.com/moa241.html] - VMDK-Handbook


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
jartim
Contributor
Contributor
Jump to solution

Not a fix for you, but I do have the same problem on an XP xw8600 workstation with Zeon quad core processor and 4GB ram. Just as you say, the guest VM intermittently 'locks' up and is totally unresponsive for what appears to be a random length of time, then suddenly springs back into life. This happens regularly with just one guest running, but also with several guests running as well. It happens with single core guests and multiple core guests. So far I can identify no particular setting from one guest to another that has any affect on the lock up happening. This only started when I upgraded from workstation 6.5 to 7.1.1, so I guess the problem is in the core host software somewhere.

0 Kudos
tester711
Contributor
Contributor
Jump to solution

Exact same upgrade (6.5 -> 7.1.1) And similar lockups.

BUT! The guests don't freeze when Host's Administrator has his desktop active (Windows XP) - I mean freezes are only observed when I fast-switch from Windows Administrator user (who runs the WS) to another user.

0 Kudos