VMware Cloud Community
Felix001
Contributor
Contributor

Troubleshooting ESXi5 Fail

I recently completely lost connectivity to my ESXi5.1 hypervisor. I was unable to ping it, I was even unable to obtain an ARP. After a reboot I can now get to it, but I want to locate the issue.

Ive had a quick through the logs, but there are quite a few log files.

Can any point me the right direction into some of the things (i.e the best logs files etc) that I can check to find the root cause ??

Thanks,

Tags (1)
Reply
0 Kudos
6 Replies
khaliqamar
Enthusiast
Enthusiast

is there anything in iLo logs or in VC logs?

Reply
0 Kudos
zXi_Gamer
Virtuoso
Virtuoso

For starters, look at /var/log/vmkwarning, then followed by /var/log/vmkernel [I doubt this would be useful , since the log might have been rotated]

You can also take a look at

vsish -e get /net/pNics/vmnic4/stats

device {

   -- General Statistics:

   Rx Packets:366079436

   Tx Packets:186695760

   Rx Bytes:116074983221

   Tx Bytes:180071628822

   Rx Errors:0

   Tx Errors:0

   Rx Dropped:0

   Tx Dropped:0

to find about any packet drops or nic incorrect activity.

Reply
0 Kudos
Felix001
Contributor
Contributor

Thanks,

I found the following logs up to the point where the device hung...

2014-03-22T16:25:20.462Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T16:55:20.529Z cpu2:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T17:25:20.600Z cpu0:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T17:55:20.651Z cpu0:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T18:25:20.703Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T18:55:20.747Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T19:25:20.793Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T19:55:20.836Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T20:25:20.882Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T20:55:20.910Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T21:25:20.946Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T21:55:20.998Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T22:05:30.811Z cpu5:215763)WARNING: UserLinux: 1331: unsupported: (void)

2014-03-22T22:25:21.055Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T22:55:21.110Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T23:01:03.030Z cpu5:4163)WARNING: VFAT: 4346: Failed to flush file times: Stale file handle

2014-03-22T23:25:21.155Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-22T23:55:21.208Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-23T00:01:02.570Z cpu4:4163)WARNING: VFAT: 4346: Failed to flush file times: Stale file handle

2014-03-23T00:25:21.252Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-23T00:55:21.305Z cpu3:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-23T01:25:21.339Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

2014-03-23T01:55:21.365Z cpu4:5227)WARNING: ScsiDeviceIO: 6678: IEC page to device "mpx.vmhba1:C0:T0:L0" has bad pagecode: 0x0

0:00:00:04.194 cpu0:4096)WARNING: Cpu: 2165: Cache latency measurement may be inaccurate min= 180 max= 1044 avg= 214

0:00:00:04.217 cpu0:4096)WARNING: CacheSched: 803: Already disabled : Cache aware scheduling already disabled

0:00:00:04.312 cpu0:4096)WARNING: VMKAcpi: 495: No IPMI PNP id found

Any ideas ??

Reply
0 Kudos
rachelsg
Enthusiast
Enthusiast

Hi

Need to go step by step and should not ignore media or cable, firewall ,port .

Reply
0 Kudos
zXi_Gamer
Virtuoso
Virtuoso

0:00:00:04.194 cpu0:4096)WARNING: Cpu: 2165: Cache latency measurement may be inaccurate min= 180 max= 1044 avg= 214

0:00:00:04.217 cpu0:4096)WARNING: CacheSched: 803: Already disabled : Cache aware scheduling already disabled

0:00:00:04.312 cpu0:4096)WARNING: VMKAcpi: 495: No IPMI PNP id found

I might not be worried about the above error message currently, but on the error messages related to scsiDeviceIO. It seems that your vmhba1 is spewing the logs at exactly 30 mins. It can relate to any health check returning bad state. Can you confirm what is your vmhba1?

Reply
0 Kudos
Felix001
Contributor
Contributor

When checking this relates to my internal Adaptec RAID card.....

Reply
0 Kudos