VMware Cloud Community
mat2k7
Enthusiast
Enthusiast

VSAN 6.6.1

Hi,

i'm running VSAN 6.6.1 on 6 Dell r730 hosts. Occasionally the CPU on one Host goes up to 100% trying to shutdown one single VM. Vcenter won't stop the task and show that another task is already running, you'll see the message every 5 Minutes. As soon as this happens the host becomes unresponsive and Vcenter will start showing VSAN integrity problems. The Host is pingable though but you cannot access it from Gui. The only way to get the host back to normal is connecting via ssh identify the VM causing the issue, which can take over an hour until the task is beinig processed and eventually kill the vm, which again can take up to one hour. Once the VM is shutdown everything gets back to normal. I saw this happening four times while trying to deploy VM Horizon Instant Clones.

I know i should Update everything to 6.7 but is isn't possible due to compatiblity issues starting with 6.7. Any Idea what could be causing this?

Thanks

0 Kudos
3 Replies
TheBobkin
Champion
Champion

Hello Mat,

That doesn't sound too vSAN-specific and thus you should consider opening a Support Request with our EUC and/or System Management teams for further investigation - though if you do open an SR, please PM me the number once host logs are up so that I can validate there is nothing misconfigured on the vSAN side of things.

The various triggered vSAN Health alerts are likely just a result of it not being able to communicate with the host and thus it is expected to have a number of failed tests (e.g. multiple network tests failed, communication with node, vCenter vs vSAN cluster members, failed to retrieve disk info etc.).

Bob

0 Kudos
mat2k7
Enthusiast
Enthusiast

Hello Bob,

thanks, i'm going to raise a support request this or next week. The VSAN configuration was done by a former colleague of mine which unfortunately is no longer available. There might be some misconfiguration on VSAN end. As you said, it may not be a vSAN specific issue but it all started as soon as vsan was implemented. The first time the issue occured was using Horizon with instant clones and nvidia Grid Cards. The issue seemd to be isolated to these VMs but unfortunately wasn't.

I'm able to solve this issue temporarily by moving the problematic VM to a Dell Storage Datastore, remove all Snapshots and perform a consolidation. Once done i can move the vm back to VSAN, take a snapshot and start deploying instant clones again. It is not possible to perform consolidation on VSAN VMs even though no snapshot is present but to be honest im fairly new to VSAN so this might be by design. If not done, the VM will cause the same issues on the host or even fail deploying Horizon Clones.

Mat

0 Kudos
mat2k7
Enthusiast
Enthusiast

Hello Bob,

the issue did not reoccure yet but i will raise a call as soon as it does.

Br

Mat

0 Kudos