VMware Cloud Community
x007alfa
Enthusiast
Enthusiast

vSphere HA Cluster - Something is missing and I don't know what...

Hi all,

I have a newly formed cluster in my lab.

My cluster is composed as follows:

  1. 3x HPE DL380 Gen8. specs:
    1. 2x Intel Xeon E5-2670v2 10c/20t;
    2. 16*8GB DDR3 ECC (tot 128GB);
    3. 2x SAS 15k rpm 146GB disks as local swap location;
    4. ESXi is installed on a 32GB SD card;
    5. 530FLR-SFP+ (Dual port 10gbps SFP+ card flexibleLom);
    6. X520-DA1 (Single port 10gbps SFP+ card pcie);
    7. I350-T4 (Quad Port 1gbps RJ45 card pcie);
  2. 1x Dell PowerConnect 8024F 24 port SFP+ 10gbps switch;
  3. 1x HPE StoreVirtual 4530. specs:
    1. 12x HPE SAS 15k rpm 600GB SAS drives in RAID6;
    2. X520-DA2 (Dual port 10gbps card);

I used the 530flrs to connect to the SAN via the switch.

I used the 520-da1 and 1 of the 1gbps nics for my vmnetwork and configured the 1gbps port to be in standby for failover.

I used 2 more ports on the i350 to make a gateways network as I have a pfsense instance running on the cluster.

The cluster is up as of right now.

I installed ESXi 7.0.0 HPE Custom Image and I deplyed the relative VCSA on host1 to manage the cluster.

All guides I follow get me to the point where HA should be enabled but something is missing...

I have HA enabled now but cluster has a yellow triangle on it and says that there are a few VMs waiting for HA failover retry... whatever that means...

If then I head to HA monitoring I see that Host2 is the master and that there are no hosts connected to it........ O.O"

What should I do? I'm stuck.....

Thanks for any help...

Fabio

0 Kudos
29 Replies
x007alfa
Enthusiast
Enthusiast

fun thing is I have all that XD

Management network is good with redundant nics...

PING rtt is less then 1ms between all of them... like way less... if I remember correctly is in the 0.350ms range...

Shared storage... yes I have a SAN (StoreVirtual 4530)...

0 Kudos
x007alfa
Enthusiast
Enthusiast

Oh and fun thing is that now I tryied opening a support request and the submission page apparently is broken....... way to go VMWare... I open up the Diagnostic page of chrome and is full of errors...

0 Kudos
x007alfa
Enthusiast
Enthusiast

Please someone has ever experienced this kind of problems? I don't know how to get it to work and vmware docs are useless in this scenario... I didn't find a single document explaining what to do to diagnose improper configuration or communication problems between the hosts... I followed the guides... am I just this unluky???

I cannot open a ticket... the menus are all empty on my browser and chrome is just slammed full of errors in the console for dev when I open the support page...

0 Kudos
scott28tt
VMware Employee
VMware Employee

There's plenty of information here, including a checklist - hopefully something to help you: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.avail.doc/GUID-5432CA24-14F1-44E3-8...


-------------------------------------------------------------------------------------------------------------------------------------------------------------

Although I am a VMware employee I contribute to VMware Communities voluntarily (ie. not in any official capacity)
VMware Training & Certification blog
0 Kudos
x007alfa
Enthusiast
Enthusiast

Create a vSphere HA Cluster in the vSphere Client

I followed this to the comma... and it didn't work as expected... now I'm away from the office for a few days... I'm tired of formatting everything and redo all over everything... it's a pain to do on three hosts... hpe hosts also take ages to boot up...

I have a physically redundant management network, I have an iSCSI SAN on 10Gbe connected to all hosts and everyone pings everyone...

The hosts have enterprise licenses and the vcenter has standard license... so everything is hunky dory... NOT.

Don't know what to do next... i did vmkpings... it's all useless everything is perfect and nothing works... all pings are under 0,5ms...

All cable runs are below 2m...

0 Kudos
depping
Leadership
Leadership

I have no idea why this is happening at this point, if it is all configured as you say it is, and the pings work and there is nothing blocked from a firewall point of view than it should just work. No need to reformat as that doesn't change anything. You could disable/enable vSphere HA to see what happens, or on each host click "reconfigure for HA". If that doesn't work, file a support request, someone will need to look at it a bit more in-depth, go through the logs to see what is happening.

0 Kudos
x007alfa
Enthusiast
Enthusiast

Hi depping​, it's crazy really... if there is some weird shenanigans that could only happen to a .1% of people normally it's sure to happen to me... I also thought it was pretty simple once I tried it once or twice...

I did try to disable and re-enable HA and also do a reconfigure for HA host by host but... nope... the weird thing is that vCenter HA (the 3 VCSA VMs) works perfectly and no anomalies were found... it just worked from the get go...

As previously stated I am having problems opening a ticket as the web page just return a bunch of errors and I cannot select my product in the list... all the lists are empty for some reason...

I don't know what else to try... other then take everything out, run my VMs on a standalone server for the moment and just hook the servers and SAN up to a separate network... but really it shouldn't change a thing... DNS records are good and FQDN resolutions are working fine... pings are good... I'm stuck... and nobody seems to have had any of these issues which I completely don't understand...

I appreciate your help...

0 Kudos
depping
Leadership
Leadership

Only thing, again, I can imagine is the problem is networking/security. I would recommend looking at /var/log/fdm.log. This is the HA log file, it may be able to give a hint.

0 Kudos
x007alfa
Enthusiast
Enthusiast

I will surely do that I didn't know that was the HA log file I was trying to look for it...

Once I get back in office cause I'm away for work for a few days still... on friday I should be back...

0 Kudos
depping
Leadership
Leadership

each of the hosts will have a log file like that, I would look at the master and one of the slave nodes.

0 Kudos