VMware Cloud Community
Rob95
Contributor
Contributor

VMWare Local SAS disk becoming Unavailable

Hi all,

First post here, hoping you guys can lend a newbie a hand.

I recently applied all the available patches and upgrades via vCenter Upgrade Manager, and since then one of my hosts has been having an issue.

Twice now, one of the local disks have become "Unavailable", and has taken all the VMs running off that disk offline.

I'm running a Dell PowerEdge R630, on VMware ESXi, 6.0.0 build 7504637.

What can i do to troubleshoot this?

My only resolution at the moment is to reboot the host, however its one of the most populated hosts we have, running a large number of critical VMs, so rebooting is a real impact on the workflow of the teams.

Could you give me any pointers for what to look for to troubleshoot this?

I tried looking at the vmkernel.log when the disk was offline, but i just got an I/O error, and the file wouldnt open. (Assuming the logs being stored on the problematic disk)

The host has 3 local disks (2 1.5TB SAS SSDs and 1 400GB SAS HDD) as well as two 10TB SAN partitions.

The disk in question only has about 15 out of 80 VMs running off of it.

0 Kudos
3 Replies
daphnissov
Immortal
Immortal

Hi and welcome.

I recently applied all the available patches and upgrades via vCenter Upgrade Manager, and since then one of my hosts has been having an issue.

What was the source version and build?

I'm running a Dell PowerEdge R630, on VMware ESXi, 6.0.0 build 7504637.

What is your BIOS version?

What can i do to troubleshoot this?

Start by looking at vmkernel.log on the ESXi host. When you pinpoint the time, look at hostd.log to see what occurred just before.

Twice now, one of the local disks have become "Unavailable", and has taken all the VMs running off that disk offline.

So you're running critical VMs on local storage only despite having external, shared storage?

The host has 3 local disks (2 1.5TB SAS SSDs and 1 400GB SAS HDD) as well as two 10TB SAN partitions.

Which disk experienced the connection loss? I'm assuming your 2 x 1.5 TB SSDs are in some form of RAID, correct? If so, what storage adapter device is connecting the devices?

In general, it's usually not a good idea to run production VMs off local storage for reasons you just experienced.

0 Kudos
Rob95
Contributor
Contributor

Hi daphnissov,

I didnt note the version before the update. The vCenter i used to upgrade the machine is V 6.5.0, build 7312210.

Ill have to see if our datacenter team can see the BIOS version. I dont have physical access to the host, nor do i have iDRAC access to the host.

I checked vmkernel, but didnt check hostd. After checking it, it only goes back to 2018-01-29T12:10:57, where my issue happened at 11:20GMT. Ill have to hope it doesnt happen again, but if it does, ill make sure to check these.

This host specifically was setup before i joined. Ive then used this as a template for our new hosts. One of the 2 10TB SANs is used as a NAS in the office, and the other is for VMWare, which is almost full. All our hosts have local storage as their primary storage solution. I didnt realise that its generally a bad idea to have this setup. At this point it'd be quite a lot of work to get all the VMs moved over to a shared storage solution, We have 7 hosts setup like this, and this is the first time we have had an issue with the local disks (well. 2nd, as this happened a couple weeks back too)

The host is JBOD. No raid across the SSDs. The disk in question was one of the two SSDs.

Ill have to look at the best way to get signed off for a new SAN.

Would ideal setup be to have 0 local storage, and have it all on a shared SAN?

0 Kudos
daphnissov
Immortal
Immortal

Would ideal setup be to have 0 local storage, and have it all on a shared SAN?

Ideally, you should use local storage for ESXi installation (flash media such as USB or SD card is very common) and then shared storage over a SAN for all virtual machines. Again, generally speaking, unless you have a very very specific use case and scenario, storing and running production VMs on ESXi local storage is a very bad idea and should be avoided at all costs.