HI everyone,
I have a few ESXi 5.0 connecting to a NAS iSCSI.
The server is sure to have sufficient for ALL VM to use all allocated memory.
When the load is a bit high (around 100 ms latency) and if the host has swap ( ~ as little as 2MB), the host tends to be not responding:
- Some of the VMs tend to hang. Some of them are fine.
- Port 443 is unable to telnet any more
- ls /vmfs/volumes inside the host hangs.
The remaining hosts have no swap, so no problem.
I have done the following steps:
- /etc/init.d/hostd restart
- /etc/init.d/vpxa restart
- /etc/init.d/wman restart
- esxcfg-rescan vmhba35 (my iscsi) but hit "Error: Unable to scan VMkernel SCSI subsystem for old devices. Scan already in progress"
Reboot will solve the problem. But I don't want to reboot.
I don't have the direct access to DCUI.
Any help is appreciated.
Thanks so much.
BTW, Storage IO and management IO are to different vdSwitches.
Looks like its a bug in ESXi 5.0
have a look into this blog.
http://vmtoday.com/2012/02/vsphere-5-networking-bug-affects-software-iscsi/
I find that server 2 with v5.0.0 update 01 is fine.
It could recover the iscsi session.
Server 1 with v5.0.0 (GA) couldn't recover the iscsi session. So it failed.
I updated server 1 to v5.0.0 update 02
I will know if update 02 mitigate the problem within a day.
After a IO latency spike, the host lost management connection again (not responding).
The remaining recovers the access within 1 second and survives.
Any help to recover iSCSI connection and get the management up again using SSH?
Thanks million