Solved: Re: Windows LSI_SAS Driver, Storport hotfix and pe...

EdZ314 · ‎08-07-2013

While investigating a Windows 2008 R2 VM disk performance problem, I recently came across the following article about an updated Storport driver for Windows 2008 R2 that indicates it could significantly affect storage performance (see below). It looks really interesting but I have a couple of questions (1) Does the LSI_SAS driver use Storport? and (2) has anyone tried this update?

Computer intermittently performs poorly or stops responding when the Storport driver is used in Windows Server 2008 R2

http://support.microsoft.com/kb/2468345

Consider the following scenario:

You install some high-performance storage devices on a computer that is running Windows Server 2008 R2. For example, you install a host-based RAID adapter or a Fibre Channel adapter that can access more than 4 gigabytes (GB) on a computer that is running Windows Server 2008 R2.

The Storport.sys driver is used to manage these storage devices.

The computer has more than 4 GB of physical memory.
You install an application or service that uses a large amount of memory on the computer.

For example, you install SQL Server or Exchange Server on the computer.

In this scenario, the computer may intermittently perform poorly or stop responding for a while.

For example, the computer stops responding for 10-20 seconds.

Details from VM problem:

> Several of these events: 129 LSI_SAS N/A N/A Reset to device, \Device\RaidPort3, was issued.

> Server had to be rebooted to clear the problem

dexion111 · ‎03-11-2014

What version of lsi_sas driver are you using? the one from 2008 windows thunks it should use (silly windows)

I have been dealing with this for nearly a year now. Always sql servers high io, high cpu 129's (resets) boom no box.

I swiched one of the machines to the VMware paravirtual and I cant replicate the issue. That caused me to google:

lsi_sas reset to device switch to paravirtual fixes the issue. That brought up gold: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=206334...

Im leaving the one machine on paravirtual but I have changed machines over to a new driver I found on lsi's site 1.34.3.0 from 6/13/2011 which is past where they say the error is fixed.

No reoccurrence on the new driver boxes but it could be too soon. The paravirtual box however I consider a success it would die 2 times every weekend but not after the new controller.

Ive attached the driver its not the easiest to find.

View solution in original post

admin · ‎08-08-2013

what is the underlying storage model on which the VM is running. is it supported as per vmware hcl guide

EdZ314 · ‎08-08-2013

Yes - it is supported. The storage is using Cisco fibre switches to Symmetrix VMAX storage.

Biomehanika · ‎08-29-2013

Microsoft just asked us to implement this hotfix. Unfortunately we have now the same question like you which is also unanswered here. Did you find maybe more infos during the time which passed now?

EdZ314 · ‎09-06-2013

No - i have not found out if the SCSI driver uses storport or not.

EdZ314 · ‎09-26-2013

Confirmed that the SAS adapter in ESXi 5.x uses the storport driver so the hotfix could be applicable.

Steve_King · ‎02-21-2014

Did the hotfix fix the issue for you? We're facing the same thing and there's not a lot of info out there on it.

EdZ314 · ‎02-21-2014

No - we continue to see the event ID 129 storage timeout errors. We've tried a lot so far, updating the HBA drivers on ESX, insuring proper round robin path policy on LUN's, updated another Storport driver on Windows, checked health of the storage array and cables, switches, etc. Everything checks out OK. We actually have both Windows perfmon and ESXTOP captures from a couple of these events, and it shows that the I/O on the Windows VM just drops to almost zero across every disk. We also split the VM disks across multiple HBA's in the VM and even though the disks are on separate virtual HBA's, they all stop sending data. Strangely, a small trickle of data will continue on some of the disks, so it's not completely blocked. A reboot appears to be the only way to clear it up. We've engaged VMWare, Microsoft, our storage vendor, and no one has been able to find a problem so far. Please post any details you have that you feel may be of importance and we can compare. The most interesting thing is that this is only happening on one cluster out of several, which happens to have 4 x 8 GB HBA cards in the ESX hosts and runs extremely large VMs (up to 128 GB and 16 vCPU).

Steve_King · ‎02-21-2014

Interesting. We're in the same boat. It only happens on one of the biggest SQL servers we have. It's 12 core/92GB RAM and it actually runs alone on a 12 core/96GB host. It usually happens during some high IO overnight processes, and a reboot is the only cure. This has been hitting us for a year now -- about a dozen occurrences so far.

There are four hosts in the cluster running three of the big SQL servers, and only one VM has the problem. The problem follows the VM, not the host. A second VM with identical specs does not have the problem. The VM is Windows 2008 R2 with SQL 2008 R2.

We're using Dell Compellent storage, and over the last year this has been happening we've completely migrated from FC to 10G ISCSI with no change in the behavior. We've changed from fixed path to RR. We've disabled VAAI. We've moved from four virtual SCSI controllers in the VM down to one, and back up to four. We've updated from 5.0 to 5.1 and will be moving to 5.5 next week. Nothing has helped.

Do your problem VMs have the same number of cores as your host like mine does? I was thinking of trying to size the VM down to see if it would help.

EdZ314 · ‎02-21-2014

The profile of your VM is almost identical to the ones here. They are Windows 2008 R2 and SQL 2008 R2 and some Windows 2012 with SQL 2012. The number of CPU's on the VM varies from 4-16, so that doesn't seem to be related. We're using EMC VMAX storage. The biggest VM is 16 cores (4 sockets x 4 cores) with 130 GB of RAM. It's very intermittent - occurs about every 2-3 weeks. We checked into the LSI HBA driver in Windows and it's end of life, so there are no newer updates available.

dexion111 · ‎03-11-2014

What version of lsi_sas driver are you using? the one from 2008 windows thunks it should use (silly windows)

I have been dealing with this for nearly a year now. Always sql servers high io, high cpu 129's (resets) boom no box.

I swiched one of the machines to the VMware paravirtual and I cant replicate the issue. That caused me to google:

lsi_sas reset to device switch to paravirtual fixes the issue. That brought up gold: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=206334...

Im leaving the one machine on paravirtual but I have changed machines over to a new driver I found on lsi's site 1.34.3.0 from 6/13/2011 which is past where they say the error is fixed.

No reoccurrence on the new driver boxes but it could be too soon. The paravirtual box however I consider a success it would die 2 times every weekend but not after the new controller.

Ive attached the driver its not the easiest to find.

EdZ314 · ‎03-11-2014

Excellent - this appears to be the exact issue we are seeing. We are on the older drivers also.

Steve_King · ‎03-11-2014

Nice find and thanks for posting.

We dropped one of our boxes that was crashing multiple times per week from 12 vcpu to 6 and it hasn't happened in two weeks. That said, I'm sure it's just a matter of time.

Very cool to get a real fix. This has been a bad one.

dexion111 · ‎03-11-2014

yup lots of late nights for me and compellent and VMware not having any idea. Odd VMware support wouldn't know about their own kb.... please update if does the trick. I think im fixed but its too soon to tell.

aymeric · ‎03-19-2014

We had this error and updating did the trick. Same experience with the vmware support, it is a shame.

MSDS · ‎05-29-2014

Currently having this exact issue with Server 2012 R2 within a VMware 5.5 environment. The driver it uses is version 1.34.3.82

Has anyone else battled this problem with Server 2012 R2?

dexion111 · ‎05-29-2014

not having the issue with 2012. However, I do know a vm on the host with the old driver will take out the entire host one vm at a time. Do you have any vms on there with 2008 and that driver?

MSDS · ‎05-29-2014

If you get a chance, can you tell me which driver version you are using? Strangely enough, the only driver update that works on our test machine is the one that you posted, but would love to find one for 2012 R2.

dexion111 · ‎05-29-2014

Sure the driver my windows 2012 r2;s are using is the one that comes with windows like the one you are using

1.34.3.82 3/26/2013 so not much help there I guess.

so far no issues at all and its about 6 months old. Its an exchange server.

MSDS · ‎05-30-2014

Just to clarify, the drive we are having issues with is a 4TB Mapped Raw Lun and the LSI SAS (Physical Bus Sharing) virtual controlled involved with it. We also have a second virtual controller that is a virtual disk running the system drive that is having no issues.

How do you have your exchange server running Controller wise?

All

Windows LSI_SAS Driver, Storport hotfix and performance