Hi,
I have a IBM BladeH with four HS22 blade installed.
I use VSphere Enterp Plus with ESX on 4.1 release.
This morning I found a cluster node down, all the VMs off, without the HA has worked.
On the node in /var/log/messages I have found many many message like that:
Oct 10 01:08:06 esx2 vobd: Oct 10 01:08:06.360: 5295395498800us: http://esx.clear.storage.redundancy.restored Path redundancy to storage device naa.600a0b800074cc480000014f4c47ba58 (Datastores: "VMDisk1430_A") restored. Path vmhba2:C0:T0:L0 is active again..
Oct 10 01:09:43 esx2 vobd: Oct 10 01:09:43.364: 5295492544743us: http://vob.scsi.scsipath.por Power-on Reset occurred on vmhba2:C0:T0:L0.
These messages were opened on October 7, but today the system crashed.
On the file /var/log/vmkernel I have found many many message like that:
Oct 14 06:01:05 esx2 vmkernel: 65:11:49:34.674 cpu4:62021)WARNING: LinScsi: SCSILinuxAbortCommand: The driver failed to call done from itsabort handler and yet it returned SUCCESS
Oct 14 06:01:05 esx2 vmkernel: 65:11:49:34.674 cpu4:62021)WARNING: LinScsi: SCSILinuxAbortCommands: Failed, Driver qla2xxx, for vmhba2
Oct 14 06:01:05 esx2 vmkernel: 65:11:49:34.674 cpu4:62021)WARNING: ScsiPath: 5176: Set retry timeout for failed TaskMgmt abort for CmdSN 0x0, status Failure, path vmhba2:C0:T0:L0
Oct 14 06:01:05 esx2 vmkernel: 65:11:49:34.777 cpu11:62023) drv 8.46]
Oct 14 06:01:07 esx2 vmkernel: 65:11:49:36.514 cpu6:61981)ScsiDeviceIO: 1672: Command 0x16 to device "naa.600a0b800074cc480000014f4c47ba58" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Oct 14 06:01:07 esx2 vmkernel: 65:11:49:36.514 cpu6:61981)WARNING: NMP: nmp_DeviceStartLoop: NMP Device "naa.600a0b800074cc480000014f4c47ba58" is blocked. Not starting I/O from device.
Oct 14 06:01:07 esx2 vmkernel: 65:11:49:36.514 cpu13:4135)WARNING: FS3: 7030: Reservation error: Timeout
This morning I solved the problem by restarting the physical node from IBM AMM.
After the restart there were no more of these error messages the HBA.
What happened?
Why the HA did not worked?
One last statement: when I am connected to the machine through the console, I have seen this error screen. (See attached file)
Some people can halp me?