I recently completely lost connectivity to my ESXi5.1 hypervisor. I was unable to ping it, I was even unable to obtain an ARP. After a reboot I can now get to it, but I want to locate the issue.
Ive had a quick through the logs, but there are quite a few log files.
Can any point me the right direction into some of the things (i.e the best logs files etc) that I can check to find the root cause ??
Thanks,
is there anything in iLo logs or in VC logs?
For starters, look at /var/log/vmkwarning, then followed by /var/log/vmkernel [I doubt this would be useful , since the log might have been rotated]
You can also take a look at
vsish -e get /net/pNics/vmnic4/stats
device {
-- General Statistics:
Rx Packets:366079436
Tx Packets:186695760
Rx Bytes:116074983221
Tx Bytes:180071628822
Rx Errors:0
Tx Errors:0
Rx Dropped:0
Tx Dropped:0
to find about any packet drops or nic incorrect activity.
Thanks,
I found the following logs up to the point where the device hung...
2014-03-22T16:25:20.462Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T16:55:20.529Z cpu2:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T17:25:20.600Z cpu0:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T17:55:20.651Z cpu0:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T18:25:20.703Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T18:55:20.747Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T19:25:20.793Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T19:55:20.836Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T20:25:20.882Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T20:55:20.910Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T21:25:20.946Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T21:55:20.998Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T22:05:30.811Z cpu5:215763)WARNING: UserLinux: 1331: unsupported: (void)
2014-03-22T22:25:21.055Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T22:55:21.110Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T23:01:03.030Z cpu5:4163)WARNING: VFAT: 4346: Failed to flush file times: Stale file handle
2014-03-22T23:25:21.155Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-22T23:55:21.208Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-23T00:01:02.570Z cpu4:4163)WARNING: VFAT: 4346: Failed to flush file times: Stale file handle
2014-03-23T00:25:21.252Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-23T00:55:21.305Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-23T01:25:21.339Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
2014-03-23T01:55:21.365Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0
0:00:00:04.194 cpu0:4096)WARNING: Cpu: 2165: Cache latency measurement may be inaccurate min= 180 max= 1044 avg= 214
0:00:00:04.217 cpu0:4096)WARNING: CacheSched: 803: Already disabled : Cache aware scheduling already disabled
0:00:00:04.312 cpu0:4096)WARNING: VMKAcpi: 495: No IPMI PNP id found
Any ideas ??
Hi
Need to go step by step and should not ignore media or cable, firewall ,port .
0:00:00:04.194 cpu0:4096)WARNING: Cpu: 2165: Cache latency measurement may be inaccurate min= 180 max= 1044 avg= 214
0:00:00:04.217 cpu0:4096)WARNING: CacheSched: 803: Already disabled : Cache aware scheduling already disabled
0:00:00:04.312 cpu0:4096)WARNING: VMKAcpi: 495: No IPMI PNP id found
I might not be worried about the above error message currently, but on the error messages related to scsiDeviceIO. It seems that your vmhba1 is spewing the logs at exactly 30 mins. It can relate to any health check returning bad state. Can you confirm what is your vmhba1?
When checking this relates to my internal Adaptec RAID card.....