VMware Cloud Community
jpiscaer
Enthusiast
Enthusiast

LSI SMI-S causing high latency?

I've also posted this on my blog, VirtualLifestyle.nl.

I've been having a lot of latency issues lately with two Dell PowerEdge R310s. These 1U boxes have a low end controller, an PERC H200. I've been having latency spikes in the range of 500-600ms, which is high enough to have the Linux VMs remount their filesystems in read-only mode continuously. This basically happens any time any of the VMs does moderate (say, 25+) iops, and causes the controller to lock up and take down multiple other VMs along the way. It also happens during any operation on the controller itself, like formatting a disk with VMFS, creating a snapshot, consolidating a disk or removing a snapshot.

As you can imagine, performance was abysmal, and something needed to be done. I've been monitoring guests to see if something inside the Guest OS caused the controller to skid out of control (I even down- and upgraded Linux kernels and changed guest filesystems), I have tried different mpt2sas driver versions, different advanced storage settings and many other things, but in the end, nothing really helped.

Until I spotted this post on the Dell Community by a user called 'damirc':

Well, found my solution.
It seems that the LSI SMI-S provider (the health provider for the vSphere console) is not too comfortable with Dell PERC H200 (or LSI 9211-8i) and seriously slows down disk i/o.

Worth a try, right? I removed the vib ('lsiprovider') and rebooted the host. And hey presto, I could easily push the SSD and H200 controller north of 4.000 iops with sub 10ms latency without any issue, which is pretty good in my view, and it certainly is a substantial improvement from the latency spikes and horribly low iops before. After a couple of hours of testing and monitoring, the previously mentioned issues seem to have completely disappeared by removing the SMI-S provider.

Now, I'm very curious if others have similar experiences with using the LSI SMI-S provider in conjunction with a Dell PERC H200 or H310? I can't find any confirmed cases (only some unconfirmed cases: HP EVA SMI-S Provider Collection Latency Issue and Dell R210 II alternative SATA/SAS RAID controllers?.

I've filed a support request with LSI (P00099195) to find out if it's a confirmed bug? Does this apply to a specific vib version, a specific controller (OEM version, firmware version), or anything else?

I will keep monitoring the issue and doing some more testing on a spare host that still has the issue to see if I can narrow it down. I'll post an update here if appropriate.

Cheers, Joep Piscaer VMware vExpert 2009 Virtual Lifestyle: http://www.virtuallifestyle.nl Twitter: http://www.twitter.com/jpiscaer LinkedIn: http://www.linkedin.com/in/jpiscaer **If you found this information useful, please consider awarding points**
0 Kudos
5 Replies
jomeyIT
Contributor
Contributor

Hello,

i have exact the same problem.

Since early 2014 with ESXi 5.5 and now with ESXi 5.5 Update 1.

Dell Server with PERC H200 and RAID1.

I testet different LSI SMISes, no luck !!!

Host absolutely slow (VM´s, Host LAN Transfer ....) with installed cim.

same workaround as you

Have you ever used/tested a LSI Controller Driver for the PERC H200 ?

i have no solution yet, sorry 😞

0 Kudos
jpiscaer
Enthusiast
Enthusiast

I have asked LSI and some guys inside VMware if they have any more information on this, but it’s hard to uncover any more information. LSI Support did get back to me, stating:

According to LSI Engineering department, this latency is caused by a bug in the hypervisor. The bug should be fixed in vSphere 5.1 Update 3 and 5.5 Update 2.

It seems this issue will be fixed in an upcoming release of vSphere, so I guess we need to use the work-around until then and hope the fix will actually make the 5.5 Update 2 release. I’m wondering if this issue is LSI-specific, or a more bug more widely affecting other SMI-S providers, too.

Cheers, Joep Piscaer VMware vExpert 2009 Virtual Lifestyle: http://www.virtuallifestyle.nl Twitter: http://www.twitter.com/jpiscaer LinkedIn: http://www.linkedin.com/in/jpiscaer **If you found this information useful, please consider awarding points**
0 Kudos
Bleeder
Hot Shot
Hot Shot

There was a new LSI SMIS Provider released last month that might fix this issue.  If you go to the LSI page for any controller (example: http://www.lsi.com/products/raid-controllers/pages/megaraid-sas-9260-8i.aspx) and then expand Management Software and Tools, it should be there.

0 Kudos
jomeyIT
Contributor
Contributor

hello bleeder

two weeks ago, i tested this new LSI SMI

no luck, same Problems

thank´s for the Input


0 Kudos
MuadDib_007
Contributor
Contributor

Hi,

I just installed a Dell R610 with PERC H700 512MB and BBU, 4x 146GB SAS Drives in RAID5.

Running ESXi 5.5 update 2 from 09-09-2014.

FW versions could be outdated (3 years old).

Write speed is POOR!! best case 10 MB/s.

Read speed is much better around 179 MB/s.

So probably the same issue as you are facing.

I tried to remove the VIB named lsiprovider but I cannot locate it on my ESXi nstallation

Maybe you have some other advice for me?

Also if there is anything I could test on my server for this matter that is no problem.

For now it is still being used as testenvironment.

Regards,

0 Kudos