VMware Cloud Community
meqajbif
Contributor
Contributor

Problems vSAN after upgrade 6.0 (U3) to 6.5 (latest version)

Hello everybody!

I have 3 servers of same configuration (compatible in HCL). Was installed vsphere and esxi 6.0 latest version (U3). Deployed vSAN. Everything was great. It worked for a long time - no errors, no problem.

After updating to 6.5 (via ISO 201704001-5310538), all hosts have problems:

1. They do not enter in the Maintenance mode - error "Operation timed out".

2. Also, when rebooting, hosts began to boot very long (approximately 20 minutes instead of 5).

3. In vSAN Health I see the error "vSAN CLOMD liveness - failed", the status for hosts is "Abnormal" - "can not connect to clomd process and possible it's down".

When using the command "/etc/init.d/clomd start" CLOMD is started, but just a minute down again.

*Note (maybe important?) after updating the hypervisor and vserver, the drivers and firmware for the disk subsystem were updated via the web client (vsphere itself suggested updating).

Any help?

pastedImage_4.png

9 Replies
GreatWhiteTec
VMware Employee
VMware Employee

Going from 6.0 U3 to 6.5d (6.6) is not a supported path.

See Supported upgrade paths for vSAN 6.6 (2149840) | VMware KB

0 Kudos
meqajbif
Contributor
Contributor

What can I do now? How to fix the situation?

0 Kudos
a_p_
Leadership
Leadership

I'm afraid that the best (and maybe only) option in this case is to immediately open a support case with VMware.

Assuming that you are not the first one who ran into this, they might be able to solve the issue.


André

0 Kudos
meqajbif
Contributor
Contributor

Unfortunately, we have a key only for the evaluation period - through my personal account I can not send a request for technical support? Maybe there is another way?

0 Kudos
GreatWhiteTec
VMware Employee
VMware Employee

Is this a lab/test environment?

You also need to be aware that there is a 6.6 version for vCenter. When doing upgrades vSphere with or without vSAN, the vCenter should be upgraded first, then the ESXi hosts.

If prod environment, GSS may need to be involved. If you are testing 6.6, you could re-image each host with 6.6, but you will have to configure those hosts from scratch which is a downside. If you have host profiles in place, then it would be easier.

0 Kudos
meqajbif
Contributor
Contributor

At first - thank you very much for your patience and help!

1. Yes, it's a test lab, but we do not want to set everything up from zero...

2. Yes, we updated vCenter to the latest version 6.5d first

Questions:

1. I understand correctly that the problem is not in updating vCenter from version U3, but only in updating HOSTS from U3 version?

2. Is it true that in this case, if I completely reinstall esxi on hosts - will problems disappear?

3. Do I need to reinstall vCenter in this case or not?

4. Will the data on vSAN be corrupted? (I read that they should not be damaged when reinstalling the hypervisor).

5. Can I use "host profile" in vCenter when reinstalling esxi, so that I do not re-enter the settings for hosts manually?

0 Kudos
MJMSRI
Enthusiast
Enthusiast

Hi, As the hosts are affected, you could revert to the previous build.I have also done this and it worked fine, however this was reverted before i upgraded the On Disk format version to 5.0.

  • Reverting an ESXi host is only valid if the host was updated using these methods:
    • VIB installation or removal
    • Profile installation or removal
    • ESXi host update using VMware Update Manager
    • Updated from a ISO
  1. In the console screen of the ESXi host, press Ctrl+Alt+F2 to see the Direct Console User Interface (DCUI) screen.
  2. Press F12 to view the shutdown options for the ESXi host.
  3. Press F11 to reboot.
  4. When the Hypervisor progress bar starts loading, press Shift+R. You will see the warning:

    Current hypervisor will permanently be replaced
    with build: X.X.X-XXXXXX. Are you sure? [y/n]

  5. Press Y to roll back the build.
  6. Press Enter to boot.

Reverting to a previous version of ESXi (1033604) | VMware KB

0 Kudos
FM19999999
Enthusiast
Enthusiast

i am running vsan 6.6 and ran into the same issues.

Maintenance mode would timeout.

Finally found that CLOMD was abnormal on a host.

I ssh'd to the host and checked the status of the service.

I found it was not running.

I started it using /etc/init.d/clmod start

checked the status

and found it was now running.
I retried the maintenance mode and found it completed with no issues.

pastedImage_0.png

0 Kudos
TheBobkin
Champion
Champion

Hello,

OP said they tried restarting clomd so this is not the solution here - also a caveat to starting this service, on Witness nodes this is not running and should be left this way.

From the symptoms described it sounds like they may have been hitting this which is resolved in 6.5 U1 (and has some potential workarounds not specified here):

kb.vmware.com/kb/2149968

Bob