VMware Cloud Community
kiddx
Contributor
Contributor

Problems with HA/DRS hanging VM's

We have a test lab that was on ESX4.1 with vsphere, running 5 servers and a Nexenta SAN (just a test dev lab)

We recently upgraded to ESX5i and Vcenter 5, by shutting all VM's down, doing a complete uninstall re-install of vcenter4, and then each server to esxi5.

Since we have done this we often end up with vm's stuck in no man's land when we migrate manually , or put a server in maintenance mode etc. They get to @ 66% and then we start getting all kind of iscsi errors. What ends up happening is the vm gets to a point where its migrating off server1 and not showing up in server2. I end up with a server that is unresponsive, you right click on it and everything is greyed out. After waiting for an hour the vm never comes back. We end up having to remove HA across the board which seems to free it up,  I imagine I might be able to restart management network on the boxes, but I havent tried that yet.

This seems to happen consistently, and we feel at this point its pretty unreliable for us. This could be an issue with Nexenta (3.1) and esx5 but we aren't 100% sure that the errors we are seeing are actually a problem with the SAN or not.

2011-09-26T20:47:20.322Z cpu3:2051)ScsiDeviceIO: 2316: Cmd(0x4124007ab2c0) 0x1a, CmdSN 0x2bc6d to dev "mpx.vmhba0:C0:T1:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0

Also seeing stuff like State in Doubt , request fast path state update, but only during the migration times.

0 Kudos
2 Replies
ashleyw
Enthusiast
Enthusiast

We were having stability issues with NexentaStor CE/vSphere5, and we found out that specific patches were implemented earlier this week (comstar and others). I don't know which release you are on, but check you are on 3.1.1 and fully patched with "setup appliance upgrade" - the versions/build of CE 3.1.1 which seem to be reliable for us (so far) under CIFS and iscsi are; NMS Version:3.1.1-6584 (r9461),NMC Version:3.1.1-6622 (r9469),NMV Version:3.1.1-6608 (r9456),OS Version:3.1.1.

Even after this you'll probably see the same issue others (and myself) are seeing on multiple different SANs - long boot times on the ESX5i hosts - all seems to be related when attaching to IP based storage (particualrly iscsi) under vsphere5.

0 Kudos
kiddx
Contributor
Contributor

Thanks we have a scheduled upgrade of both nexenta units and the latest esx patches this week. Hopefully this solves our issues

0 Kudos