History :
I've a cluster of three same ESXi servers (version 5.5.0 u1 / DELL PowerEdge R610) with their vCenter for more than one year.
The cluster running correctly until january.
The first accidents were VMs linux : the file format failed and reloaded in read only => reboot vm and ok !
The following accidents were a crash of an ESX (SSH of ESXi responded but when an command line was writed, there was no response like blocked) :
-> VMs on that server were no longer visible or accessible
-> Problem with HA (Host failure, VM and application monitoring) that has not detected the problem and did not restart the remote VMs.
-> By cons, the fact having to stop and restart the server, HA has triggered.
After the incident and later, an other ESX of the cluster crashed with the same problem and soluce.
After the incident and later, ESX of the cluster crashed again...
The only display I've seen in the logs ("/var/log/vmkwarning.log") was :
2016-03-05T14:20:01.487Z cpu11:34017119)ALERT: hostd detected to be non-responsive
...
2016-03-06T16:00:01.403Z cpu1:34258627)ALERT: hostd detected to be non-responsive
2016-03-06T16:35:01.761Z cpu12:34264098)ALERT: hostd detected to be non-responsive
So,
I decided to completely rebuild a blank cluster with the same servers and the last version of ESXi and vCenter "5.5.0 u3".
Datastores have been connected and VMs were revived.
Everything went well for a week...
When I've the unpleasant surprise of a crash of an ESX Server again.
With the same even logs before the crash ("/var/log/vmkwarning.log") :
2016-03-15T15:10:02.206Z cpu12:1969007)ALERT: hostd detected to be non-responsive
2016-03-15T15:49:02.025Z cpu11:1974883)ALERT: hostd detected to be non-responsive
---
I don't know what to do,
I don't know where to look to find my concern.
Is it possible that a VM can make an ESX crash?
Thank you for your ideas and your support,
Bye.
GEO
Please share more details such as storage devices type, ESXi log files.
Hello ^^
Thank you about response.
More details such as storage,
ESX and VM are connected to DELL PS4001 with eSCSI (without encryption).
Difficult about log files,
Just around "ALERT: hostd detected to be non-responsive" :
2016-03-15T13:16:20.036Z cpu2:465325)WARNING: CBT: 2060: Unsupported ioctl 60
2016-03-15T13:16:20.036Z cpu2:465325)WARNING: CBT: 2060: Unsupported ioctl 59
2016-03-15T13:16:20.036Z cpu2:465325)WARNING: CBT: 2060: Unsupported ioctl 43
2016-03-15T13:45:10.541Z cpu6:32803)WARNING: ScsiDeviceIO: 1223: Device naa.603be8bf6ea1fec5e553f50100002004 performance has deteriorated. I/O latency increased from average value of 43368 microseconds to 909475 microseconds.
2016-03-15T13:45:16.733Z cpu5:32802)WARNING: ScsiDeviceIO: 1223: Device naa.603be8bf6ea1fec5e553f50100002004 performance has deteriorated. I/O latency increased from average value of 43369 microseconds to 878421 microseconds.
2016-03-15T14:22:08.356Z cpu1:33024)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.603be8bf6ea1fec5e553f50100002004" state in doubt; requested fast path state update...
2016-03-15T14:22:19.388Z cpu13:32810)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.603be8bf6ea1fec5e553f50100002004" state in doubt; requested fast path state update...
2016-03-15T14:22:19.413Z cpu3:46602)WARNING: ScsiDeviceIO: 1223: Device naa.603be8bf6ea1fec5e553f50100002004 performance has deteriorated. I/O latency increased from average value of 43318 microseconds to 1121555 microseconds.
2016-03-15T14:22:19.414Z cpu3:46602)WARNING: ScsiDeviceIO: 1223: Device naa.603be8bf6ea1fec5e553f50100002004 performance has deteriorated. I/O latency increased from average value of 43318 microseconds to 2499496 microseconds.
2016-03-15T14:22:35.508Z cpu13:32810)WARNING: ScsiDeviceIO: 1223: Device naa.603be8bf6ea1fec5e553f50100002004 performance has deteriorated. I/O latency increased from average value of 43320 microseconds to 5924671 microseconds.
2016-03-15T14:48:26.356Z cpu14:32941)WARNING: VSCSI: 2850: handle 8468(vscsi0:1):Retry 0 overdue by 2 seconds
2016-03-15T14:48:27.276Z cpu12:464394)WARNING: VSCSI: 3573: handle 8467(vscsi0:0):WaitForCIF: Issuing reset; number of CIF:1
2016-03-15T14:48:27.276Z cpu12:464394)WARNING: VSCSI: 2495: handle 8467(vscsi0:0):Ignoring double reset
2016-03-15T15:10:02.206Z cpu12:1969007)ALERT: hostd detected to be non-responsive
2016-03-15T15:49:02.025Z cpu11:1974883)ALERT: hostd detected to be non-responsive
TSC: 779379 cpu0:1)WARNING: ACPI: 1386: SPCR: Detected unsupported reg bit width (0); will assume 8 bits.
TSC: 785046 cpu0:1)WARNING: ACPI: 1435: SPCR: Detected invalid baud rate (0); will assume 115200
0:00:00:00.000 cpu0:1)WARNING: Serial: 646: Invalid serial port config: mem-mapped to addr 0x0.
2016-03-15T16:19:49.870Z cpu7:33382)WARNING: LinuxSignal: 538: ignored unexpected signal flags 0x2 (sig 17)
2016-03-15T16:19:51.139Z cpu10:33262)WARNING: Team.etherswitch: TeamES_Activate:692: Failed to initialize beaconing on portset 'pps': Not implemented.
2016-03-15T16:19:54.773Z cpu4:33421)WARNING: LinScsiLLD: scsi_add_host:573: vmkAdapter (usb-storage) sgMaxEntries rounded to 255. Reported size was 65535
2016-03-15T16:19:54.774Z cpu4:33421)WARNING: LinScsiLLD: scsi_add_host:573: vmkAdapter (usb-storage) sgMaxEntries rounded to 255. Reported size was 65535
2016-03-15T16:20:07.976Z cpu10:33433)WARNING: ScsiScan: 1408: Failed to add path vmhba0:C0:T0:L0 : Not found
2016-03-15T16:20:07.979Z cpu10:33433)WARNING: ScsiScan: 1408: Failed to add path vmhba0:C0:T1:L0 : Not found
2016-03-15T16:20:12.377Z cpu12:33262)WARNING: NetDVS: 567: portAlias is NULL
2016-03-15T16:20:12.514Z cpu12:33262)WARNING: Tcpip_Vmk: 808: Failed to set default gateway (51): Network unreachable
2016-03-15T16:20:12.517Z cpu12:33262)WARNING: NetDVS: 567: portAlias is NULL
2016-03-15T16:20:12.520Z cpu12:33262)WARNING: Tcpip_Vmk: 808: Failed to set default gateway (51): Network unreachable
2016-03-15T16:20:12.523Z cpu12:33262)WARNING: NetDVS: 567: portAlias is NULL
2016-03-15T16:20:12.526Z cpu12:33262)WARNING: Tcpip_Vmk: 808: Failed to set default gateway (51): Network unreachable
2016-03-15T16:20:12.529Z cpu12:33262)WARNING: NetDVS: 567: portAlias is NULL
What other file would help?
Thanck you very much.
Hello,
I've updated to 5.5.0u3 but same problem
Just to say that I resolved my problem by downgrading the version for ESX servers from version 5.5.0 u3 to 5.5.0.
Is it possible as hardware problem ?
Nevertheless, thank you.