Raised a support ticket with VMware and got the answer. Leaving this here in case anyone else is also curious:
For the query, please find the below screenshot as an example:
You are absolutely correct, when the ESXi is using 2 NTP servers,
the host looks at the statum level, deviation from currently configured time, and decides the source to sync with.