has any1 seen thie message on the console b4?
2:11:56:14:323 CPO0:1024)VMNIX:<0> scsc: device set Offline-command error recovery failed: host0charge0 Id 0 lan 0
thanks
any more information on this??
I have IBM x3650's with director agent and they all see this. I have all the bios adnd firmware updates applied I could find. No issues have resulted yet from these errors.
Anyone?
"Me too."
Two identically configured x3650's, no third-party software installed (not even IBM Director), just vanilla ESX. Was hopeful the RAID upgrade would fix it but the drives are still going offline daily. We have not opened a support case yet but are about to with both IBM and VMWare.
We have the same Problem at an 3650 with the newest Firmware
and ESX 3.01
Are there an solution ??
Kind regards.
Christian
I haven't found one yet, no. We're working slowly on the case with vmware due to my own workload.
We have tried the options of the following site:
We havent installed the patch, but we have changed the PHY Rate.
Hmm, that's not a good description of my symptoms. The system doesn't pause, the RAID goes totally offline and stays that way. The box doesn't reboot. No logs are written.
We have now installed the critical patches ESX-7302867 and ESX-1000039.
We havent get the error since four days.
Christian
All, we've found our root cause to be an issue with the IBM ServerRAID 8k and SCSI backplane. So, anybody with IBM x3650 or x3655 hardware with RAID issues should check out this article to see if it fits:
I had this same problem. I'm running IBM Blades HS20 - 8843 blades, boot from local disks, all vmfs datastores are san attached. I saw this thread and I tried to find a bios/ firmware update for the LSI 1030 raid controller, but never found one. Once I got this console message - :00:03:06:880 CPU0:1024)VMNIX: <0>scsi: device set offline - not ready of command retry failed after bus reset: host 3 channel 0 id 0 lun 0, I called support - but didn't really get anywhere.
I even went so far as to reinstall esx, but still the message persisted. I ended up upgrading the bios for the blade server from 1.09 to 1.10 , and the message went away.
Sounds like that reinforces the common cause being the disk controller.
I have seen this error as well. On an unsupported configuration, but still. In my case it was a faulty SCSI cable. This cable caused errors on the SCSI bus, and one of my ESX hosts decided to set the local bootdisk to readonly. The result: VMs still running (amazing!), when logging into the SC, every command ended up in a "cannot read" error.
The problem was resolved by shutting all VMs manually (RDP), and then switching off the host, changeing the cable and then reboot. The problem has not occured ever since. So my guess would be to check your cables, make sure there is no comms problem between host and storage.
We got the same error too, our machines are running a cluster with 2x IBM 3950 that boot off a SAN and share an additional lun for VM's. We got the error when upgrading our switch fabrics.
We have Multi-path IO via 2 switch fabric and I am confused why when we upgraded one switch at a time, only one of the esx hosts got this error. The other host is still running, and only the boot lun is affected.
I hope it is the Firmware, but at one customer site ( you know who) the servers were updated to the latest firmware and bios last september, before we migrated the servers to ESX 3.X from ESX 2.5. Then suddenly in November a disk failed, and the ESX server froze..... These are IBM x366 machines.......
I never witnessed this on HP, Dell or other vendors.....weird