VMware Cloud Community
sankar17
Contributor
Contributor

vCenter reboot causes disk failures in the vSAN cluster, any guidance will be helpful for the vCenter reboot if it is configured with vSAN cluster

Hello VMware experts,

I am experimenting with vSAN lab setup. The vCenter reboot causes disk failures in the vSAN cluster.The disk error reads "Flash Disk is down", Permanent disk failure

Any guidance will be helpful for the vCenter reboot if it is configured with vSAN cluster.

  1. Consider the vCenter is configured with vSAN cluster
  2. vSAN cluster has 4 Nested ESXi hosts [ each ESXi has 4vCPU/16GB RAM/200 GB HDD, 20GB SSD[Original Type: HDD,just type forced to SSD]

Some times even without reboot also vSAN cluster disks goes to error state.

Thanks,

Sankar

Tags (1)
0 Kudos
3 Replies
TheBobkin
Champion
Champion

Hello Sankar,

The problem with exposing HDDs as SSD cache-tier devices is that they may not be able to keep up with the demands asked of them and if they are really struggling they may get marked as failed.

Just to make sure I am understanding you correct: The vCenter machine' vmdk are located on the vsandatastore?

Also, are the disks getting dropped the INSTANT you boot the VM or after a minute or two?

The reason I ask this is that vCenter VM (or vCSA here?) can get relatively busy during startup as it starts all the services.

If the disks are dying during this period, you could experiment with starting the services manually (or with scheduled delay) over a longer time-frame.

Which version of vSAN are you running here?

If possible, you should consider utilising an actual SSD (doesn't have to be huge, doesn't have to be enterprise-grade so not talking much investment)for the cache devices.

What are these HDDs backed by? Are they on SAN or just local disks and if so, what kind of controller are the HDD devices attached to?

A great place to start with design of nested setups would be this awesome blog:

http://www.virtuallyghetto.com/nested-virtualization

Bob

-o- If you found this comment useful or answer please select as 'Answer' and/or click the 'Helpful' button ,please ask follow-up questions if you have any -o-

0 Kudos
sankar17
Contributor
Contributor

Thanks Bob for the reply. Please find my answers below.

  1. Yes I have followed http://www.virtuallyghetto.com/nested-virtualization and setup the lab
  2. The vCenter appliance (VCSA embedded controller) is not in vSanDatastore. It is deployed in the baremetal ESX host.
  3. vSphere 6.0 U2 and vSAN 6.2
  4. The disks attached are local disks of the respective hosts. RAID Controller PERC H730 RAID Controller, 1GB NV Cache
    1. Each host has 3 HDDs [ Disk1: 2GB OS, Disk2: 200GB, Disk3: 20GB( exposed as SSD) ]

Note:

  1. Disk error state occurs even without reboot also. Some times vSAN works well more than 24 hrs. At times it vSAN datastore becomes 0B capacity even in few hours due to disk errors.
  2. With VCSA reboot, always disks goes to error state

Thanks,

Sankar

0 Kudos
admin
Immortal
Immortal

not sure what root cause is as your environment is nested.

but if you use esxi 6.02 you may have to look below KB

Required vSAN and ESXi configuration for controllers based on the LSI 3108 chipset (2144936) | VMwar...

There are configuration you may have to change it. (This became default value from 6.0 p3)

Thanks

0 Kudos