freeman_camion
Contributor
Contributor

Problem with LSI 9260-4i after upgrade ESXi from 5.0 to 5.1.0a

Jump to solution

I installed on ESXi 5.0 offline bundle 5.1, and after this installed offline bundle 5.1.0а. Now, I have ~ # vmware -v
VMware ESXi 5.1.0 build-838463.

And in my log watching interesting situation /var/log/vmkernel.log

2012-11-28T10:05:42.290Z cpu7:5039)WARNING: ScsiDeviceIO: 1211: Device  naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O  latency increased from average value of 19209 microseconds to 577155  microseconds.
2012-11-28T10:05:44.296Z cpu11:4107)WARNING:  ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41  performance has deteriorated. I/O latency increased from average value  of 19264 microseconds to 580562 microseconds.
2012-11-28T10:05:51.541Z  cpu3:37757)WARNING: ScsiDeviceIO: 1211: Device  naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O  latency increased from average value of 19406 microseconds to 602629  microseconds.
2012-11-28T10:06:39.775Z cpu9:6338)WARNING:  ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41  performance has deteriorated. I/O latency increased from average value  of 20438 microseconds to 618588 microseconds.
2012-11-28T10:06:50.950Z  cpu0:6338)WARNING: ScsiDeviceIO: 1211: Device  naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O  latency increased from average value of 20666 microseconds to 743760  microseconds.
2012-11-28T10:07:31.248Z cpu9:6339)ScsiDeviceIO: 1191:  Device naa.600605b0049021e017579dc7047fff41 performance has improved.  I/O latency reduced from 743760 microseconds to 145638 microseconds.
2012-11-28T10:07:31.930Z  cpu9:37983)VSCSI: 2370: handle 8198(vscsi0:0):Reset request on FSS  handle 5188905 (67 outstanding commands)
2012-11-28T10:07:31.930Z cpu0:4202)VSCSI: 2648: handle 8198(vscsi0:0):Reset [Retries: 0/0]                       
2012-11-28T10:07:31.930Z cpu0:4202)megasas: ABORT sn 1436974 cmd=0x8a retries=0 tmo=0                            
2012-11-28T10:07:31.930Z cpu0:4202)<5>0 :: megasas: RESET -1436974 cmd=8a retries=0                              
2012-11-28T10:07:31.930Z cpu0:4202)megaraid_sas: HBA reset handler invoked without an internal reset condition.
2012-11-28T10:07:31.994Z  cpu1:4097)WARNING: LinScsi: SCSILinuxQueueCommand:1193:queuecommand  failed with status = 0x1055 Host Busy vmhba1:2:0:0 (driver name: LSI  Logic SAS based MegaRAID driver) - Message repeated 1 time
2012-11-28T10:07:31.994Z  cpu1:4097)ScsiDeviceIO: 2303: Cmd(0x412400735100) 0x2a, CmdSN  0x800201a0 from world 6336 to dev "naa.600605b0049021e017579dc7047fff41"  failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-11-28T10:07:32.017Z  cpu1:6335)ScsiDeviceIO: 2303: Cmd(0x41240077cd40) 0x2a, CmdSN 0x5542  from world 4173 to dev "naa.600605b0049021e017579dc7047fff41" failed  H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-11-28T10:07:32.941Z  cpu0:4202)<7>megaraid_sas: megasas_wait_for_outstanding: line  2156: AFTER HBA reset handler invoked without an internal reset  condition:   took 1 seconds. Max is 180.
2012-11-28T10:07:32.941Z cpu0:4202)megaraid_sas: no more pending commands remain after reset handling.
2012-11-28T10:07:32.941Z cpu0:4202)<5>megasas: reset successful                                                      
                                                                                                                     
2012-11-28T10:07:32.941Z cpu0:4202)megasas: ABORT sn 1436980 cmd=0x2a retries=0 tmo=0                                
2012-11-28T10:07:32.941Z cpu0:4202)<5>0 :: megasas: RESET -1436980 cmd=2a retries=0                                  
2012-11-28T10:07:32.942Z cpu0:4202)megaraid_sas: HBA reset handler invoked without an internal reset condition.
2012-11-28T10:07:32.942Z  cpu0:4202)<7>megaraid_sas: megasas_wait_for_outstanding: line  2156: AFTER HBA reset handler invoked without an internal reset  condition:   took 0 seconds. Max is 180.
2012-11-28T10:07:32.942Z cpu0:4202)megaraid_sas: no more pending commands remain after reset handling.               
2012-11-28T10:07:32.942Z cpu0:4202)<5>megasas: reset successful                                                      
                                                                                                                     
2012-11-28T10:07:32.942Z cpu0:4202)megasas: ABORT sn 1437298 cmd=0x8a retries=0 tmo=0
2012-11-28T10:07:32.942Z cpu0:4202)<5>0 :: megasas: RESET -1437298 cmd=8a retries=0                                  
2012-11-28T10:07:32.942Z cpu0:4202)megaraid_sas: HBA reset handler invoked without an internal reset condition.      
2012-11-28T10:07:32.942Z  cpu0:4202)<7>megaraid_sas: megasas_wait_for_outstanding: line  2156: AFTER HBA reset handler invoked without an internal reset  condition:   took 0 seconds. Max is 180.
2012-11-28T10:07:32.942Z cpu0:4202)megaraid_sas: no more pending commands remain after reset handling.               
2012-11-28T10:07:32.942Z cpu0:4202)<5>megasas: reset successful

Now periodically latensi disk subsystem rolls over to the upper grades, respectively guest VMs very slow. I suspect that the problem is in the new drivers of raid controller, tell where to dig?

Tags (4)
0 Kudos
1 Solution

Accepted Solutions
jrmunday
Commander
Commander

Have you installed the latest VIB for this card - see attached;

  • scsi-megaraid-sas-6.504.51.00-1vmw.500.0.0.472560.x86_64.vib

See release notes;

http://www.lsi.com/downloads/Public/MegaRAID%20Common%20Files/MR_VMWare5_DRIVER-6.504.51.00.txt

vExpert 2014 - 2018 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77

View solution in original post

0 Kudos
15 Replies
jrmunday
Commander
Commander

Have you installed the latest VIB for this card - see attached;

  • scsi-megaraid-sas-6.504.51.00-1vmw.500.0.0.472560.x86_64.vib

See release notes;

http://www.lsi.com/downloads/Public/MegaRAID%20Common%20Files/MR_VMWare5_DRIVER-6.504.51.00.txt

vExpert 2014 - 2018 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77

View solution in original post

0 Kudos
freeman_camion
Contributor
Contributor

Jon Munday wrote:

Have you installed the latest VIB for this card - see attached;

  • scsi-megaraid-sas-6.504.51.00-1vmw.500.0.0.472560.x86_64.vib

My ESXi in out of the box have new drivers for LSI card:

~ # esxcli software vib list | grep sas
scsi-megaraid-sas              5.34-4vmw.510.0.0.799733            VMware   VMwareCertified     2012-11-27 
scsi-mpt2sas                   10.00.00.00-5vmw.510.0.0.799733     VMware   VMwareCertified     2012-11-27 
scsi-mptsas                    4.23.01.00-6vmw.510.0.0.799733      VMware   VMwareCertified     2012-11-27 

I just installed the lsiprovider from LSI website for sensors in vsphere client

~ # esxcli software vib list | grep lsi
lsiprovider                    500.04.V0.32-0003                   LSI      VMwareAccepted      2012-11-27

On my mind, in name of the latest driver for card from lsi website, symbols vmw.500.0.0. means that it is for esxi 5.0

but I could be wrong...

Are these drivers (scsi-megaraid-sas-6.504.51.00-1vmw.500.0.0.472560.x86_64.vib) are compatible with 5.1.0a version?

0 Kudos
jrmunday
Commander
Commander

The documentation on the LSI site is not consistent in all areas, some documents only have official support upto version 4.1 but the driver download is listed as 5.x (suggesting both 5.0 and 5.1) .... The VIB has 500 in the filename (suggesting 5.0) and the readme simply say vmware 5 - confusing, and just not cool.

Looking at the release notes, the current version is 6.504.51.00 and the following bug fixes and enhancements exist between the version that you have (version 5.34);


SCGCQ00305931  -  Fixed core dump problem by changing last argument of vmklnx_scsi_register_poll_handler() call.
SCGCQ00299008  -  Fixed core dump problem by changing last argument of vmklnx_scsi_register_poll_handler() call.
SCGCQ00271239  -  Changed version, release, and copyright dates to 2012.
SCGCQ00249268  -  Add support for fpRead/WriteCapable & fpRead/WriteAcrossStripe for Thunderbolt/Invader.
SCGCQ00299009  -  Fixed core dump problem by changing last argument of vmklnx_scsi_register_poll_handler() call.

SCGCQ00284156 - (Closed) - Needs to stop OS from issuing command when waiting for 180 seconds after abort.
SCGCQ00252233 - (Port_Complete) - Add support for fpRead/WriteCapable & fpRead/WriteAcrossStripe for Thunderbolt/Invader.
SCGCQ00252658 - (Completed) - To not set region lock type when FW informs driver to bypass lock.

I would personally upgrade this in the first instance and see if it addresses the issue.

vExpert 2014 - 2018 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
freeman_camion
Contributor
Contributor
Maybe you're right. Thanks, I'll try.
0 Kudos
jrmunday
Commander
Commander

Remember to reboot the host after this VIB has been installed Smiley Happy

vExpert 2014 - 2018 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
freeman_camion
Contributor
Contributor

I update driver. At first look, all is well. The controller is initialized. I'll be watching (must see tomorow in high loading by production). In any case, thanks for the advice Smiley Happy

0 Kudos
jrmunday
Commander
Commander

How is it looking today - hopefully all good.

vExpert 2014 - 2018 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
freeman_camion
Contributor
Contributor

Many, many thanks for you! Technically all is nice )) Latensi of storage system look like a charm and nothing slows.

But yesterday after reboot in vmkernel i saw not pretty recording:

012-11-28T21:29:58.512Z cpu8:4764)ScsiScan: 888: Path 'vmhba1:C0:T4:L0': Vendor: 'ATA     '  Model: 'ST3000DM001-9YN1'  Rev: 'CC4C'                                                                                           
2012-11-28T21:29:58.512Z cpu8:4764)ScsiScan: 891: Path 'vmhba1:C0:T4:L0': Type: 0x0, ANSI rev: 5, TPGS: 0 (none)                                                                                                                             
2012-11-28T21:29:58.513Z cpu8:4764)megasas_slave_configure: do not export physical disk devices to upper layer.                                                                                                                              
2012-11-28T21:29:58.513Z cpu8:4764)WARNING: ScsiScan: 1276: Failed to add path vmhba1:C0:T4:L0 : Not found                                                                                                                                   
2012-11-28T21:29:58.514Z cpu8:4764)ScsiScan: 888: Path 'vmhba1:C0:T5:L0': Vendor: 'ATA     '  Model: 'ST3000DM001-9YN1'  Rev: 'CC4C'                                                                                     
2012-11-28T21:29:58.514Z cpu8:4764)ScsiScan: 891: Path 'vmhba1:C0:T5:L0': Type: 0x0, ANSI rev: 5, TPGS: 0 (none)                                                                                                                             
2012-11-28T21:29:58.515Z cpu8:4764)megasas_slave_configure: do not export physical disk devices to upper layer.                                                                                                                              
2012-11-28T21:29:58.515Z cpu8:4764)WARNING: ScsiScan: 1276: Failed to add path vmhba1:C0:T5:L0 : Not found                                                                                                                                   
2012-11-28T21:29:58.567Z cpu8:4764)Vol3: 692: Couldn't read volume header from control: Not supported                                                                                                                           
2012-11-28T21:29:58.567Z cpu8:4764)Vol3: 692: Couldn't read volume header from control: Not supported                                                                                                                                        
2012-11-28T21:29:58.567Z cpu8:4764)FSS: 4972: No FS driver claimed device 'control': Not supported                                                                                                                                           
2012-11-28T21:29:58.568Z cpu8:4764)VC: 1547: Device rescan time 2 msec (total number of devices 6)                                                                                                                                           
2012-11-28T21:29:58.568Z cpu8:4764)VC: 1550: Filesystem probe time 28 msec (devices probed 5 of 6)                                                                                                                                           
2012-11-28T21:29:58.612Z cpu8:4764)Vol3: 692: Couldn't read volume header from control: Not supported                                                                                                                                        
2012-11-28T21:29:58.612Z cpu8:4764)Vol3: 692: Couldn't read volume header from control: Not supported                                                                                                                                        
2012-11-28T21:29:58.612Z cpu8:4764)FSS: 4972: No FS driver claimed device 'control': Not supported                                                                                                                                           
2012-11-28T21:29:58.617Z cpu8:4764)VC: 1547: Device rescan time 2 msec (total number of devices 6)                                                                                                                                           
2012-11-28T21:29:58.617Z cpu8:4764)VC: 1550: Filesystem probe time 31 msec (devices probed 5 of 6)                                                                                                                                           
2012-11-28T21:29:58.654Z cpu8:4764)Vol3: 692: Couldn't read volume header from control: Not supported                                                                                                                                        
2012-11-28T21:29:58.654Z cpu8:4764)Vol3: 692: Couldn't read volume header from control: Not supported

but my datastore is up and VMs working.

Now by high load i see periodically only this:

2012-11-29T12:00:20.117Z cpu2:6367)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x85 (0x4124008086c0, 5249) to dev "naa.600605b0049021e017579dc7047fff41" on path "vmhba1:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2012-11-29T12:00:20.117Z cpu2:6367)ScsiDeviceIO: 2316: Cmd(0x4124008086c0) 0x85, CmdSN 0x57 from world 5249 to dev "naa.600605b0049021e017579dc7047fff41" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2012-11-29T12:00:20.117Z cpu2:4491)ScsiDeviceIO: 2316: Cmd(0x4124008086c0) 0x4d, CmdSN 0x58 from world 5249 to dev "naa.600605b0049021e017579dc7047fff41" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2012-11-29T12:00:20.117Z cpu2:4098)ScsiDeviceIO: 2316: Cmd(0x4124008086c0) 0x1a, CmdSN 0x59 from world 5249 to dev "naa.600605b0049021e017579dc7047fff41" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

All systems still working stable, I hope this will continue.

0 Kudos
jrmunday
Commander
Commander

Good news ... feel free to mark this as answered if all is good Smiley Happy

vExpert 2014 - 2018 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
freeman_camion
Contributor
Contributor

Done. Smiley Wink

0 Kudos
mtimley
Contributor
Contributor

After installing new driver I got this error on my guests:

Reset to device, \Device\Raidport0, was issued.

0 Kudos
freeman_camion
Contributor
Contributor

mtimley wrote:

After installing new driver I got this error on my guests:

Reset to device, \Device\Raidport0, was issued.

On your host run

vmware -v

esxcli software vib list | grep lsi

What OS in your guests? What vm hardware versions? VM Tools current?

0 Kudos
AMDDJA
Contributor
Contributor

I have same issue but without upgrade to 5.1.0

vmware -v

VMware ESXi 5.0.0 build-623860

esxcli software vib list | grep sas


scsi-megaraid-sas      5.34-1vmw.500.1.11.623860           VMware   VMwareCertified   2012-07-22
scsi-mpt2sas           06.00.00.00-6vmw.500.1.11.623860    VMware   VMwareCertified   2012-07-22
scsi-mptsas            4.23.01.00-5vmw.500.0.0.469512      VMware   VMwareCertified   2012-07-22

Host server:

IBM x3550 M4

RAID adapter:

ServeRAID M5110 SAS/SATA Controller

Guest server:

Windows Server 2008 R2 SP1

Issue in guest log:

Source: LSI_SAS

Event ID: 129

Warning: Reset to device, \Device\RaidPort0, was issued.

The system work very slow when it happen.

The guest restart solve the problem. But after some days it happen one more and one more...

0 Kudos
freeman_camion
Contributor
Contributor

After installing last new driver from lsi.com part of problem has gone. But every time, when i copy or move any file more then 100mb, i have this problem again and again:

12-12-06T11:07:34.635Z cpu9:6369)WARNING: ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O latency increased from average value of 13622 microseconds to 426696 microseconds.             
2012-12-06T11:07:39.861Z cpu4:6367)WARNING: ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O latency increased from average value of 13627 microseconds to 859780 microseconds.             
2012-12-06T11:07:40.142Z cpu3:4099)WARNING: ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O latency increased from average value of 13628 microseconds to 2000809 microseconds.            
2012-12-06T11:07:40.187Z cpu3:5063)WARNING: ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O latency increased from average value of 13629 microseconds to 4054825 microseconds.            
2012-12

When this happens all my guest OSes have slow too.

0 Kudos
freeman_camion
Contributor
Contributor

After some fixes I have better results with read/write IO latency and with warrning "performance has deteriorated. I/O  latency increased from average value". What I've done:

- install last firmware for raid controller

- install last esxi driver for megaraid

- install last scsi-mpt2sas driver

# ./MegaCli -AdpAllInfo -aAll | grep FW
FW Package Build: 12.13.0-0154
FW Version         : 2.130.383-2315

# esxcli software vib list | grep sas
scsi-mpt2sas                   16.00.00.00.1vmw-1OEM.500.0.0.472560  LSI     VMwareCertified     2013-04-11 
scsi-megaraid-sas              6.506.51.00.1vmw-1vmw.500.0.0.472560  VMware  VMwareCertified     2013-04-09 
scsi-mptsas                    4.23.01.00-6vmw.510.0.0.799733        VMware  VMwareCertified     2012-11-27

- change configure of caches on raid (switch to writeback, enable disk and read cache)

# ./MegaCli -LDInfo -L0 -a0

Default Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Enabled

After this settings latency of my datastore increased by 10 times, but still have some read latency isues (randome 300 milliseconds jumps in idle). Now I have not see any warrning messages about datastore.

I don't have battery for controller and use all settings with some risks.

0 Kudos