Reply to Message

View discussion in a popup

Replying to:
srwsol
Hot Shot
Hot Shot

Time for an update.  I bought a LSI 9361-8i controller and I bought the capacitor battery backup so that I could enable write caching.  Unfortunately this changed nothing.  I also updated ESXi to the July patch level and that changed nothing either.  I also tried the latest driver from LSI (which the ESXi patch process promptly replaced when I applied it), and that didn't do anything either. 

I'm beginning to think this is some sort of software problem related to locking of the files, as this only happens when certain VMs start and stop, and only at the moment when they first start (before the console display starts), and when they stop and transition from running to inactive.  The two VMs that it always happens to are a Windows SBS 2008 VM and the VCenter Server appliance VM.   The only things I can see in common with those two is that they both have multiple VCPUs assigned to them and (and maybe this is important) they have quite a number of virtual disks in the configuration. 

Somehow when these VMs start and stop it's causing all I/O to the datastore to stop such that the watchdog timer goes off and causes the lost access process to begin.  I actually think that whatever is being done is happening at the controller level (i.e. ESXi has issued some sort of command to the controller that is hanging things up) because when I had both SBS 2008 and the VCenter Server appliance on this machine at the same time and they both started at once, it caused a big enough problem that controller threw a disk error and started a rebuild, which happened a couple of times until I moved the VCenter Server VM to another machine.  I've sort of ruled out a hardware problem because the rebuild was occurring on different disks in the array and these are new SSD drives.  Also, there are never any errors or lost access messages if I download the files for both VMs from the server, like one would expect if there was a disk problem and it was having trouble reading data.  Also there are never any errors or lost access messages no matter how hard I stress the array with reads and writes.  It's only at VM startup and shutdown and only for certain VMs that this happens.

I'm stumped at this point and my wallet has been drained from replacing parts.  Not a good situation Smiley Sad

Reply
0 Kudos