I installed on ESXi 5.0 offline bundle 5.1, and after this installed offline bundle 5.1.0а. Now, I have ~ # vmware -v
VMware ESXi 5.1.0 build-838463.
And in my log watching interesting situation /var/log/vmkernel.log
2012-11-28T10:05:42.290Z cpu7:5039)WARNING: ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O latency increased from average value of 19209 microseconds to 577155 microseconds.
2012-11-28T10:05:44.296Z cpu11:4107)WARNING: ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O latency increased from average value of 19264 microseconds to 580562 microseconds.
2012-11-28T10:05:51.541Z cpu3:37757)WARNING: ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O latency increased from average value of 19406 microseconds to 602629 microseconds.
2012-11-28T10:06:39.775Z cpu9:6338)WARNING: ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O latency increased from average value of 20438 microseconds to 618588 microseconds.
2012-11-28T10:06:50.950Z cpu0:6338)WARNING: ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O latency increased from average value of 20666 microseconds to 743760 microseconds.
2012-11-28T10:07:31.248Z cpu9:6339)ScsiDeviceIO: 1191: Device naa.600605b0049021e017579dc7047fff41 performance has improved. I/O latency reduced from 743760 microseconds to 145638 microseconds.
2012-11-28T10:07:31.930Z cpu9:37983)VSCSI: 2370: handle 8198(vscsi0:0):Reset request on FSS handle 5188905 (67 outstanding commands)
2012-11-28T10:07:31.930Z cpu0:4202)VSCSI: 2648: handle 8198(vscsi0:0):Reset [Retries: 0/0]
2012-11-28T10:07:31.930Z cpu0:4202)megasas: ABORT sn 1436974 cmd=0x8a retries=0 tmo=0
2012-11-28T10:07:31.930Z cpu0:4202)<5>0 :: megasas: RESET -1436974 cmd=8a retries=0
2012-11-28T10:07:31.930Z cpu0:4202)megaraid_sas: HBA reset handler invoked without an internal reset condition.
2012-11-28T10:07:31.994Z cpu1:4097)WARNING: LinScsi: SCSILinuxQueueCommand:1193:queuecommand failed with status = 0x1055 Host Busy vmhba1:2:0:0 (driver name: LSI Logic SAS based MegaRAID driver) - Message repeated 1 time
2012-11-28T10:07:31.994Z cpu1:4097)ScsiDeviceIO: 2303: Cmd(0x412400735100) 0x2a, CmdSN 0x800201a0 from world 6336 to dev "naa.600605b0049021e017579dc7047fff41" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-11-28T10:07:32.017Z cpu1:6335)ScsiDeviceIO: 2303: Cmd(0x41240077cd40) 0x2a, CmdSN 0x5542 from world 4173 to dev "naa.600605b0049021e017579dc7047fff41" failed H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0.
2012-11-28T10:07:32.941Z cpu0:4202)<7>megaraid_sas: megasas_wait_for_outstanding: line 2156: AFTER HBA reset handler invoked without an internal reset condition: took 1 seconds. Max is 180.
2012-11-28T10:07:32.941Z cpu0:4202)megaraid_sas: no more pending commands remain after reset handling.
2012-11-28T10:07:32.941Z cpu0:4202)<5>megasas: reset successful
2012-11-28T10:07:32.941Z cpu0:4202)megasas: ABORT sn 1436980 cmd=0x2a retries=0 tmo=0
2012-11-28T10:07:32.941Z cpu0:4202)<5>0 :: megasas: RESET -1436980 cmd=2a retries=0
2012-11-28T10:07:32.942Z cpu0:4202)megaraid_sas: HBA reset handler invoked without an internal reset condition.
2012-11-28T10:07:32.942Z cpu0:4202)<7>megaraid_sas: megasas_wait_for_outstanding: line 2156: AFTER HBA reset handler invoked without an internal reset condition: took 0 seconds. Max is 180.
2012-11-28T10:07:32.942Z cpu0:4202)megaraid_sas: no more pending commands remain after reset handling.
2012-11-28T10:07:32.942Z cpu0:4202)<5>megasas: reset successful
2012-11-28T10:07:32.942Z cpu0:4202)megasas: ABORT sn 1437298 cmd=0x8a retries=0 tmo=0
2012-11-28T10:07:32.942Z cpu0:4202)<5>0 :: megasas: RESET -1437298 cmd=8a retries=0
2012-11-28T10:07:32.942Z cpu0:4202)megaraid_sas: HBA reset handler invoked without an internal reset condition.
2012-11-28T10:07:32.942Z cpu0:4202)<7>megaraid_sas: megasas_wait_for_outstanding: line 2156: AFTER HBA reset handler invoked without an internal reset condition: took 0 seconds. Max is 180.
2012-11-28T10:07:32.942Z cpu0:4202)megaraid_sas: no more pending commands remain after reset handling.
2012-11-28T10:07:32.942Z cpu0:4202)<5>megasas: reset successful
Now periodically latensi disk subsystem rolls over to the upper grades, respectively guest VMs very slow. I suspect that the problem is in the new drivers of raid controller, tell where to dig?
Have you installed the latest VIB for this card - see attached;
See release notes;
http://www.lsi.com/downloads/Public/MegaRAID%20Common%20Files/MR_VMWare5_DRIVER-6.504.51.00.txt
Have you installed the latest VIB for this card - see attached;
See release notes;
http://www.lsi.com/downloads/Public/MegaRAID%20Common%20Files/MR_VMWare5_DRIVER-6.504.51.00.txt
Jon Munday wrote:
Have you installed the latest VIB for this card - see attached;
- scsi-megaraid-sas-6.504.51.00-1vmw.500.0.0.472560.x86_64.vib
My ESXi in out of the box have new drivers for LSI card:
~ # esxcli software vib list | grep sas
scsi-megaraid-sas 5.34-4vmw.510.0.0.799733 VMware VMwareCertified 2012-11-27
scsi-mpt2sas 10.00.00.00-5vmw.510.0.0.799733 VMware VMwareCertified 2012-11-27
scsi-mptsas 4.23.01.00-6vmw.510.0.0.799733 VMware VMwareCertified 2012-11-27
I just installed the lsiprovider from LSI website for sensors in vsphere client
~ # esxcli software vib list | grep lsi
lsiprovider 500.04.V0.32-0003 LSI VMwareAccepted 2012-11-27
On my mind, in name of the latest driver for card from lsi website, symbols vmw.500.0.0. means that it is for esxi 5.0
but I could be wrong...
Are these drivers (scsi-megaraid-sas-6.504.51.00-1vmw.500.0.0.472560.x86_64.vib) are compatible with 5.1.0a version?
The documentation on the LSI site is not consistent in all areas, some documents only have official support upto version 4.1 but the driver download is listed as 5.x (suggesting both 5.0 and 5.1) .... The VIB has 500 in the filename (suggesting 5.0) and the readme simply say vmware 5 - confusing, and just not cool.
Looking at the release notes, the current version is 6.504.51.00 and the following bug fixes and enhancements exist between the version that you have (version 5.34);
SCGCQ00305931 - Fixed core dump problem by changing last argument of vmklnx_scsi_register_poll_handler() call.
SCGCQ00299008 - Fixed core dump problem by changing last argument of vmklnx_scsi_register_poll_handler() call.
SCGCQ00271239 - Changed version, release, and copyright dates to 2012.
SCGCQ00249268 - Add support for fpRead/WriteCapable & fpRead/WriteAcrossStripe for Thunderbolt/Invader.
SCGCQ00299009 - Fixed core dump problem by changing last argument of vmklnx_scsi_register_poll_handler() call.
SCGCQ00284156 - (Closed) - Needs to stop OS from issuing command when waiting for 180 seconds after abort.
SCGCQ00252233 - (Port_Complete) - Add support for fpRead/WriteCapable & fpRead/WriteAcrossStripe for Thunderbolt/Invader.
SCGCQ00252658 - (Completed) - To not set region lock type when FW informs driver to bypass lock.
I would personally upgrade this in the first instance and see if it addresses the issue.
Remember to reboot the host after this VIB has been installed
I update driver. At first look, all is well. The controller is initialized. I'll be watching (must see tomorow in high loading by production). In any case, thanks for the advice
How is it looking today - hopefully all good.
Many, many thanks for you! Technically all is nice )) Latensi of storage system look like a charm and nothing slows.
But yesterday after reboot in vmkernel i saw not pretty recording:
012-11-28T21:29:58.512Z cpu8:4764)ScsiScan: 888: Path 'vmhba1:C0:T4:L0': Vendor: 'ATA ' Model: 'ST3000DM001-9YN1' Rev: 'CC4C'
2012-11-28T21:29:58.512Z cpu8:4764)ScsiScan: 891: Path 'vmhba1:C0:T4:L0': Type: 0x0, ANSI rev: 5, TPGS: 0 (none)
2012-11-28T21:29:58.513Z cpu8:4764)megasas_slave_configure: do not export physical disk devices to upper layer.
2012-11-28T21:29:58.513Z cpu8:4764)WARNING: ScsiScan: 1276: Failed to add path vmhba1:C0:T4:L0 : Not found
2012-11-28T21:29:58.514Z cpu8:4764)ScsiScan: 888: Path 'vmhba1:C0:T5:L0': Vendor: 'ATA ' Model: 'ST3000DM001-9YN1' Rev: 'CC4C'
2012-11-28T21:29:58.514Z cpu8:4764)ScsiScan: 891: Path 'vmhba1:C0:T5:L0': Type: 0x0, ANSI rev: 5, TPGS: 0 (none)
2012-11-28T21:29:58.515Z cpu8:4764)megasas_slave_configure: do not export physical disk devices to upper layer.
2012-11-28T21:29:58.515Z cpu8:4764)WARNING: ScsiScan: 1276: Failed to add path vmhba1:C0:T5:L0 : Not found
2012-11-28T21:29:58.567Z cpu8:4764)Vol3: 692: Couldn't read volume header from control: Not supported
2012-11-28T21:29:58.567Z cpu8:4764)Vol3: 692: Couldn't read volume header from control: Not supported
2012-11-28T21:29:58.567Z cpu8:4764)FSS: 4972: No FS driver claimed device 'control': Not supported
2012-11-28T21:29:58.568Z cpu8:4764)VC: 1547: Device rescan time 2 msec (total number of devices 6)
2012-11-28T21:29:58.568Z cpu8:4764)VC: 1550: Filesystem probe time 28 msec (devices probed 5 of 6)
2012-11-28T21:29:58.612Z cpu8:4764)Vol3: 692: Couldn't read volume header from control: Not supported
2012-11-28T21:29:58.612Z cpu8:4764)Vol3: 692: Couldn't read volume header from control: Not supported
2012-11-28T21:29:58.612Z cpu8:4764)FSS: 4972: No FS driver claimed device 'control': Not supported
2012-11-28T21:29:58.617Z cpu8:4764)VC: 1547: Device rescan time 2 msec (total number of devices 6)
2012-11-28T21:29:58.617Z cpu8:4764)VC: 1550: Filesystem probe time 31 msec (devices probed 5 of 6)
2012-11-28T21:29:58.654Z cpu8:4764)Vol3: 692: Couldn't read volume header from control: Not supported
2012-11-28T21:29:58.654Z cpu8:4764)Vol3: 692: Couldn't read volume header from control: Not supported
but my datastore is up and VMs working.
Now by high load i see periodically only this:
2012-11-29T12:00:20.117Z cpu2:6367)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x85 (0x4124008086c0, 5249) to dev "naa.600605b0049021e017579dc7047fff41" on path "vmhba1:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2012-11-29T12:00:20.117Z cpu2:6367)ScsiDeviceIO: 2316: Cmd(0x4124008086c0) 0x85, CmdSN 0x57 from world 5249 to dev "naa.600605b0049021e017579dc7047fff41" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2012-11-29T12:00:20.117Z cpu2:4491)ScsiDeviceIO: 2316: Cmd(0x4124008086c0) 0x4d, CmdSN 0x58 from world 5249 to dev "naa.600605b0049021e017579dc7047fff41" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2012-11-29T12:00:20.117Z cpu2:4098)ScsiDeviceIO: 2316: Cmd(0x4124008086c0) 0x1a, CmdSN 0x59 from world 5249 to dev "naa.600605b0049021e017579dc7047fff41" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
All systems still working stable, I hope this will continue.
Good news ... feel free to mark this as answered if all is good
Done.
After installing new driver I got this error on my guests:
Reset to device, \Device\Raidport0, was issued.
mtimley wrote:
After installing new driver I got this error on my guests:
Reset to device, \Device\Raidport0, was issued.
On your host run
vmware -v
esxcli software vib list | grep lsi
What OS in your guests? What vm hardware versions? VM Tools current?
I have same issue but without upgrade to 5.1.0
vmware -v
VMware ESXi 5.0.0 build-623860
esxcli software vib list | grep sas
scsi-megaraid-sas 5.34-1vmw.500.1.11.623860 VMware VMwareCertified 2012-07-22
scsi-mpt2sas 06.00.00.00-6vmw.500.1.11.623860 VMware VMwareCertified 2012-07-22
scsi-mptsas 4.23.01.00-5vmw.500.0.0.469512 VMware VMwareCertified 2012-07-22
Host server:
IBM x3550 M4
RAID adapter:
ServeRAID M5110 SAS/SATA Controller
Guest server:
Windows Server 2008 R2 SP1
Issue in guest log:
Source: LSI_SAS
Event ID: 129
Warning: Reset to device, \Device\RaidPort0, was issued.
The system work very slow when it happen.
The guest restart solve the problem. But after some days it happen one more and one more...
After installing last new driver from lsi.com part of problem has gone. But every time, when i copy or move any file more then 100mb, i have this problem again and again:
12-12-06T11:07:34.635Z cpu9:6369)WARNING: ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O latency increased from average value of 13622 microseconds to 426696 microseconds.
2012-12-06T11:07:39.861Z cpu4:6367)WARNING: ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O latency increased from average value of 13627 microseconds to 859780 microseconds.
2012-12-06T11:07:40.142Z cpu3:4099)WARNING: ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O latency increased from average value of 13628 microseconds to 2000809 microseconds.
2012-12-06T11:07:40.187Z cpu3:5063)WARNING: ScsiDeviceIO: 1211: Device naa.600605b0049021e017579dc7047fff41 performance has deteriorated. I/O latency increased from average value of 13629 microseconds to 4054825 microseconds.
2012-12
When this happens all my guest OSes have slow too.
After some fixes I have better results with read/write IO latency and with warrning "performance has deteriorated. I/O latency increased from average value". What I've done:
- install last firmware for raid controller
- install last esxi driver for megaraid
- install last scsi-mpt2sas driver
# ./MegaCli -AdpAllInfo -aAll | grep FW
FW Package Build: 12.13.0-0154
FW Version : 2.130.383-2315
# esxcli software vib list | grep sas
scsi-mpt2sas 16.00.00.00.1vmw-1OEM.500.0.0.472560 LSI VMwareCertified 2013-04-11
scsi-megaraid-sas 6.506.51.00.1vmw-1vmw.500.0.0.472560 VMware VMwareCertified 2013-04-09
scsi-mptsas 4.23.01.00-6vmw.510.0.0.799733 VMware VMwareCertified 2012-11-27
- change configure of caches on raid (switch to writeback, enable disk and read cache)
# ./MegaCli -LDInfo -L0 -a0
Default Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Enabled
After this settings latency of my datastore increased by 10 times, but still have some read latency isues (randome 300 milliseconds jumps in idle). Now I have not see any warrning messages about datastore.
I don't have battery for controller and use all settings with some risks.