For some reason one of our storage volumes dissapeared in our ESX
enviourment, and since we had to restart both I/O controllers on the
SAN we had to shutdown the entire enviourment.
status: All the storage volumes are back and are fine and idle. One of
our ESX servers has been started up and sees all the storage. Once i
start VCenter and start up one VM it stucks at "In progress" and that's
it! Still 200VM's to go!!
Anyone here that can help me out?
Many thanks and much appriciated!
I'd suggest that you need to track down WHY the storage disappeared in the first place,
This is more than likely related - more often than not, if storage went AWOL from all ESX hosts, it is related to either the network, or your storage itself actually 'blipped'
Try some simple Storage testing - copy speeds etc and see whether performance is what it should be, before loading anything from the ESX.
Can the ESX hosts still see the HBAs / iSCSI adapters? Are you able to browse the datastores? Have you tried a rescan of the adapters? What other messages can you see relating to that VM in Tasks & Events?
Okay, i'm back (after 16 hours of working) and here's what happend:
For some reason one of our 5 volumes dissapeared for all ESX servers. Some of them didn't show the volume anymore, while some other ESX servers showed the problematic volume in their inventory. When browsing the inventory, nothing happend. The stupid thing: still 60% of all VM's on the problematic volume was able to start!?
In the end we shut down everything including the I/O controllers and started up everything: that solved the problem. Here's what's my main problem:
When i shut down the SAN and boot it up again, about 8TB of data gets availible. When i boot up one ESX server and reconnect the server in VC, everything looks fine, inculding the storage. When i launch up the first VM the process hangs at "In progress" and needs about 45 minutes to start. After that, everything boots up quickly, even on all the other ESX servers.
I have to say we have nearly no DNS since all our DNS servers are VM's. Could this be the problem why the VM's start so slow after a complete network shutdown?
Mostly likely be the DNS server and also check the licensing server. If the dns server is not working, the esx can't find the licensing server.
We have had similar issues. A while back, our entire switch stack failed. The ESX hosts detected the lack of contact from other hosts and HA shut down all of the VMs as the isolation response. When we got the stack back up, we started trying to power on VM's. We always had issues with the DC's(also our DNS infrastructure). Once the first DC was up, everything would boot normally. I have resolved the issue a couple of ways(we had a power outage a month later that caused the same problem).
1.) The first time, shutting down the iptables service on the host and then restarting the DC allowed it to come up normally(I have no idea why, but it worked once, so I'm including it)
2.) Every other time I have had this issue, we have had to start the first DC in Directory Services Restore mode. While it is running, we start the second DC normally, restart the first DC, and all is well from there.
Recently, I came up with the idea of creating a snapshot(Including memory) of the primary DC on a periodic basis. While I have not had to utilize it yet, my theory is that the boot process is the problem, and if I can revert to a snapshot of the DC in a known good running state, the rest of the VM's should come up without issue.