VMware Cloud Community
kgottleib
Enthusiast
Enthusiast
Jump to solution

Performance issue with Guest OS

Hello Gurus - I have one of the most unusual issues I have run into in quite sometime.   here it is:

I have a w2008R2 guest OS -  installed in a vSpherer 5.1 VM version 7 -  that has an Emulex LPe 12002-E 8GB dual port HBA allocated to it as 2 PCI devices (1 device per port) via DirectPath

The guest OS has an application in it that is conducting some backup tests.   The HBA's wwpns are zoned in the fabric and registered in the Compellent storage center interface and all is good here. the application will create a replay of a volume, mount it as a disk, read all the blocks, then unmounts, and then moves on to repeat the process with what ever additional volumes it needs to backup.   So it is kind of cool that I am doing this with a VM right and not a physical host?  (might have been cool if I had NPIV working but that's another story, no thanks to HBA vendor support of the technology...)

So what's the problem?   Well, the backup reads from the mounted disk will run extremely slow, as slow as 5MB\sec throughput, then when other activity is taking place on the system the throughput jumps up to over 150MB\sec all  of the sudden and will continue to run fast for a long time and finish the jobs on time with each job taking a bit more than an hour.  But when the slowness is happening the same job (all jobs are the same, they are simply reading all of the blocks of a bunch of 200GB volumes and writing to a NULL set, remember this is just testing)  takes up to 10 hours to complete.

What do I think the culprit is?   POWER MANAGEMENT.

I just about every perf counter and event log known to man, and I don't see anything abnormal on the array, ESXi, the VM, nothing.  But what I don't see in the logs is any sort of power saving activity that could be happening transparently.

FACTS on this case:

- memory reservation is set (it has to be for directpath otherwise you can't power on the VM)

- CPU reservation - was not set prior to an hour ago, I just gave it a small reservation to rule this out, so I don't yet know if this will help alleviate the slow throughput, or have an effect on what I believe is a power management issue (device related)

- windows power management is set to high performance never \ never   - but not until an  hour ago did I change the advanced  settings for USB to disabled \ off.  the PCI device power saving setting was  already off.  Which contradicts my theory.  But maybe, just maybe, something is broken somewhere, or could be a bug... bla bla bla

- the Emulex HBA, as it appears in device manager in the guest OS, does NOT have a power management tab for disabling the feature, I did see that  the VMXNET3 adapter did have this tab, and I did turn off the power saving setting to be conservative, don't think this was the culprit though as there is no real data traversing the NIC.  THe data is merely being read from the mounted  disk, not being copied over any Ethernet wire.  never the less, within the Emulex properties there is a details tab, and a power data selection, and the power data shows  you what power mode the device is in, along with supported power modes.  D0 means full power,  but D3, which shows as supported by the device, is a power saving mode, .   I have not yet been able to witness the setting during the time of the slow activity, I plan to though soon as possible. It currently is  running at D0,  full power.

So this is what I have to offer on this, hope its enough..   If I'm missing something, or if others  have experienced this massive loss of throughput for no unkown reason, and have resolutions, please share.

If there is a way for  me to disable power management on the Emulex HBA please inform, I don't mind if I have to shut down to get into the firmware, if it can even be done.  But if it can I would like to disable it immediately before any new tests are run.

Reply
0 Kudos
1 Solution

Accepted Solutions
kgottleib
Enthusiast
Enthusiast
Jump to solution

The  root cause of this issue was never identified by VMware support which to me is an ugly matter.  CPU reservation for the VM using directpath IO with HBA installed in it corrected the problem.  I would be leary with direct path IO.

View solution in original post

Reply
0 Kudos
2 Replies
kgottleib
Enthusiast
Enthusiast
Jump to solution

Still not a single reply after days...

Regardless - for those  that may still  reply - a working resolution has been discovered, however, the root cause has not.

the resolution to the above throughput issue:   a CPU reservation.  Doesn't matter what size.  Also, don't even really need the reservation, an open VM remote console also resolved the issue.

Still trying to determine the root  cause of this unfortunate behavior.    VMware has something going on that is causing this problem.  There is no reason why a VM should experience this at any time.

ALso, although it looked like a power management issue, within the guest os the HBA power status during the time when throughput was  running slow was showing D0 which means normal power mode.  So the slow throughput wasn't because of device powerstate in guestOs

Another interesting fact is this - I can  log onto the guest OS via RDP and the throughput remains  sluggish and slow  at 5 MB\sec, however, soon I launch a VM remote console the throughput jumps up to 160M\sec..

A ticket has been opened with VMware support  

Reply
0 Kudos
kgottleib
Enthusiast
Enthusiast
Jump to solution

The  root cause of this issue was never identified by VMware support which to me is an ugly matter.  CPU reservation for the VM using directpath IO with HBA installed in it corrected the problem.  I would be leary with direct path IO.

Reply
0 Kudos