I'm using the drivers from the VMware KB over here: http://www.3ware.com/kb/article.aspx?id=15416 and I can't seem to get decent transfer rates either. The linux guests give a message like
"mptbase: ioc0: IOCStatus(0x004b): SCSI IOC Terminated"
in their dmesg logs whenever the ESXi message log gives
"Dec 21 23:33:10 vmkernel: 20:07:31:13.409 cpu0:5759053)<4>3w-9xxx: scsi1: WARNING: (0x06:0x002c): Unit #0: Command (0x2a) timed out, resetting card.
Dec 21 23:33:10 vmkernel: 20:07:31:26.316 cpu0:5759053)<4>3w-9xxx: scsi1: AEN: INFO (0x04:0x005e): Cache synchronization completed:unit=0."
Any bright ideas?
I have this card, and have never seen this problem. I would suggest first making sure the 9650SE's firmware is up to date, before you spend any more time troubleshooting ... check the obvious things, too; bad disk, bad cable, poor connection, etc.
I've seen these errors in ESXi 4 too. I use the 3ware's VMWare certified driver, but I have two hosts and on both the problem occurs! (completely different hardware) The firmware on the 3ware 9690SA cards are the latest, and the 3ware support have no idea what's happening...
Aug 25 13:17:38 vmkernel: 0:06:13:50.767 cpu3:4172)<4>3w-9xxx:8:0:1:0 :: WARNING: (0x06:0x002c): Command (0x2a) timed out, resetting card.
Aug 25 13:17:58 vmkernel: 0:06:14:11.475 cpu1:4172)<4>3w-9xxx: scsi8: AEN: INFO (0x04:0x005e): Cache synchronization completed:unit=1.
Aug 25 13:19:14 vmkernel: 0:06:15:26.799 cpu1:84014)<4>3w-9xxx:8:0:1:0 :: WARNING: (0x06:0x002c): Command (0x2a) timed out, resetting card.
Aug 25 13:19:35 vmkernel: 0:06:15:47.515 cpu3:84014)<4>3w-9xxx: scsi8: AEN: INFO (0x04:0x005e): Cache synchronization completed:unit=1.
Aug 25 15:17:34 vmkernel: 0:08:13:47.049 cpu0:84381)<4>3w-9xxx:8:0:1:0 :: WARNING: (0x06:0x002c): Command (0x2a) timed out, resetting card.
Aug 25 15:17:55 vmkernel: 0:08:14:08.203 cpu2:84381)<4>3w-9xxx: scsi8: AEN: INFO (0x04:0x005e): Cache synchronization completed:unit=1.
Aug 25 15:17:56 vmkernel: 0:08:14:08.899 cpu1:107109)<4>3w-9xxx:8:0:1:0 :: WARNING: (0x06:0x002c): Command (0x2a) timed out, resetting card.
Aug 25 15:18:16 vmkernel: 0:08:14:29.107 cpu3:107109)<4>3w-9xxx: scsi8: AEN: INFO (0x04:0x005e): Cache synchronization completed:unit=1.
Same problem here with ESXi4 and 3ware 9690SA. The problem is reliably triggered by running IOMeter 2006.07.27 in a Windows Server 2003 VM with this workload on a 32 GB vmdk:
Block size = 8 KB, alignment = 8 KB
50% read/write distribution
100% random access
Outstanding I/Os (queue depth) = 256
Timeouts and resets occur at least every 5-10 minutes while the test is running.
Disks are 4 x Seagate ST31000340NS (SN06 firmware - according to 3ware, upgrade to BN06 in a 4-disk configuration is not required); RAID5 (tried both 256K and 64K stripe - no changes).
Installed Ubuntu 9.04 on the physical machine for testing - unfortunately, IOMeter does not really work on Linux, but with fio the adapter does not get reset even with some insane parameters (like --iodepth=40960).
What are your cache settings set to?
What are your background disk check/rebuild settings set to?
cache settings:
We use a BBU with the card so the write cache was ON, but there were many problems (card resets) so we have to turn OFF... after this less card reset exists, but we still have
background disk check/rebuild:
auto verify was also turned ON, but because of many card resets we decided to turn it OFF
Linux on the physical hardware works, so i am guessing it might be the esxi driver.
Does the error still occur with the ESX version?
From your complaints it does not appear to work under esxi even though 3ware supports it.
Our development guys are considering getting this controller for their their esxi system.
I am advising them to get a 9690 until I learn how this issue gets resolved.
Lucas,
We're actually experiencing the same behavior on 9690SA as well, so it might be a driver issue. Anyone tried to open a support ticket with 3ware for this?
Hello,
I contacted with the support, but they can't reproduce this errors... I sent them all information about our systems and the support said that they will try to buil the same configuration to test it... I'm waiting for this about 1,5 month!
Just out of curiousity, what kind of RAID setup you experiencing this on and what kind of hard drives there are?
As in our case we have esxi 3.5 update 3 running (booting) off RAID-10 volume of four 1Tb Seagate ST31000640SS drives, and thing is we can almost reliably PSOD an entire host by introducing heavy disk IO (in terms of IOPs, not a raw linear read), and this PSOD is preceded by command timeouts on volume and VSCSIFS busy messages..