Hi,
We had an odd problem on Christmas morning (great timing!) where a bunch of VM's froze. They had disk error's in their log files (all Ubuntu Linux). When I logged in to check the problem it appeared as though there was a stuck "remove snapshot" task - was stuck on 95% - Once I cancelled this and moved the VM's to a new host they started ok.
Today I've logged in to check everything and make sure nothing else was going on and saw the following events in our log (we had a lot of these Christmas morning as well - but they had longer times before reconnection).
- Lost access to volume 4acc2582-be797e89-5292-001517c86db6 (sanstore01) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly. 26/12/2010 2:30:24 a.m.
- Successfully restored access to volume 4acc2582-be797e89-5292-001517c86db6 (sanstore01) following connectivity issues. 26/12/2010 2:30:24 a.m.
I've checked the SAN logs and cannot see anything related to this - no errors over both times.
Where should I be looking to find the problems here - there is a VM on this store that rebuilds some big search indexes and does a lot of Disk IO but we've never had a problem before - I upgraded the ESXi hosts a couple of weeks ago with the latest patches.
Any guidance appreciated!
Thanks
Alain