VMware Cloud Community
ThorstenT
Enthusiast
Enthusiast

ESXi 6.5 to vCenter 6.5 communication disrupted with custom SSL certificates

After many hours of debugging it looks like we hit a regression bug in 6.5 affecting customers with custom certificates on ESXi hosts. Has anyone else seen this behavior?

Thanks,

Thorsten

Summary (TL,DR)

The vSphere 6.5 product documentation states replacing ESXi SSL certificates is supported when using RSA keys with key sizes of 2048 bits or

more.

While custom certificates with exactly 2048 bit keys work fine, larger key sizes subtly break the ESXi to vCenter communication.  vCenter's vpxd cannot cryptographically verify heartbeat messages from such hosts and repeatedly declares them disconnected.

Adding ESXi 5.5 hosts to VCSA 6.5 does not show this behavior with larger RSAkeys, which qualifies this as a regression bug.

Symptoms

After adding the ESXi with a custom certificate to vCenter, the vCenter intermittently looses connection to the host. Every now and then, the default host connection alarm in vCenter is triggered and cleared within a second.

Besides the symptoms described above, there is a number of log lines in vpxd.log looking suspicious:

error vpxd[7FB5BFF7E700] [Originator@6876 sub=vpxCrypt opID=HeartbeatModuleStart-4b63962d] [bool VpxPublicKey::Verify(const EVP_MD*, const unsigned char*, size_t, const unsigned char*, size_t)] ERR error:04091077:rsa routines:INT_RSA_VERIFY:wrong signature length

warning vpxd[7FB5BFF7E700] [Originator@6876 sub=Heartbeat opID=HeartbeatModuleStart-4b63962d] Failed to verify signature; host: host-42, cert: (**THUMBPRINT_REMOVED**), signature : (**RSA_SIGNATURE_REMOVED**)


# in case of 3072 bit keys

warning vpxd[7FB5BFF7E700] [Originator@6876 sub=Heartbeat opID=HeartbeatModuleStart-4b63962d] Received incorrect size for heartbeat Expected size (334) Received size (462) Host host-87


# in case of 4096 bit keys

warning vpxd[7FB5BFF7E700] [Originator@6876 sub=Heartbeat opID=HeartbeatModuleStart-4b63962d] Received incorrect size for heartbeat Expected size (334) Received size (590) Host host-87

Suspected root cause

It looks like the latest VPX API leverages digital signatures to verify the integrity and authenticity of ESXi heartbeat messages sent to vCenters. Apparently, there are some hard expectations about the size of these heartbeat messages. They should be  exactly 334 bytes long, which is true if a host uses a 2048 bit RSA key to sign the message.

Signing heartbeat messages with 3072 bit keys leads to a message size of 462 bytes, 128 bytes larger than in the 2048 bit case. Moving to 4096 bits grows the signature another 128 bytes. This is interesting, as it is exactly the growth one would expect for PKCS#1 encoded RSA signatures.

I am not sure if the heartbeat package is truncated before it is passed to VpxPublicKey::Verify or if the method makes hard assumptions on the payload size by itself. It's probably the latter and can relatively easily be fixed.

Of course, the heartbeat protocol should be tolerant to heartbeats having different packet sizes as long as RSA digital signatures are used to authenticate the message.

I do not know why VMware chose digital signatures over HMACs in this instance. They should provide all necessary properties with the benefit of fixed size packages and less dependence on the choice of the asymmetric cipher.

Tags (4)
0 Kudos
1 Reply
gurugti
Contributor
Contributor

change the Bit length of the certificate to 2048 from any higher value. The issue should be fixed. vCenter is detecting the heartbeat coming from the Host as an insecure heartbeat.

Guru

0 Kudos