Hi,
To begin with I'm observing lately that the last 2 versions of Workstation has a lot of connectivity issue. This specifically occurs snapshots are taken and restored, and/or when VMs are paused and resumed (after some hours) the VMs do not connect and require manually ping each and every interface to resume connectivity. Also I observed this occurs where routing is involved as I did not face any issues when all VMs are on the same network.
I'm migrating from Virtual Switch to Distributed Switch in vCenter, the esxi hosts have dual vmnics and dual vmk adapters.
I have 2 Clusters with following esxi hosts.
Windows Server (DNS) - 192.168.10.2
vCenter - 192.168.10.5
Compute Cluster
* compute1.v.lab - vmk0 (192.168.30.10), vmk1 (192.168.30.11)
* compute2.v.lab - vmk0 (192.168.30.20), vmk1 (192.168.30.21)
Infrastructure Cluster
* infrastructure1.v.lab - vmk0 (192.168.20.10), vmk1 (192.168.20.11)
* infrastructure2.v.lab - vmk0 (192.168.20.20), vmk1 (192.168.20.21)
* infrastructure3.v.lab - vmk0 (192.168.20.30), vmk1 (192.168.20.31)
I'm following this https://www.youtube.com/watch?v=eDJ3OfXTkLs for migrating via GUI, however, for some reason some esxi hosts migrated successfully while others failed. Both set of esxi hosts have similar configuration, in my situation the compute cluster esxi hosts migrated successfully, while infrastructure cluster esxi hosts failed with the following error.
Both vmk0 and vmk1 can ping vCenter, and vCenter can ping both interfaces of the esxi host as well.
[root@infrastructure1:~] vmkping -I vmk0 192.168.10.5
PING 192.168.10.5 (192.168.10.5): 56 data bytes
64 bytes from 192.168.10.5: icmp_seq=0 ttl=63 time=0.992 ms
64 bytes from 192.168.10.5: icmp_seq=1 ttl=63 time=0.724 ms
64 bytes from 192.168.10.5: icmp_seq=2 ttl=63 time=0.720 ms
--- 192.168.10.5 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.720/0.812/0.992 ms
[root@infrastructure1:~] vmkping -I vmk1 192.168.10.5
PING 192.168.10.5 (192.168.10.5): 56 data bytes
64 bytes from 192.168.10.5: icmp_seq=0 ttl=63 time=0.731 ms
64 bytes from 192.168.10.5: icmp_seq=1 ttl=63 time=0.895 ms
64 bytes from 192.168.10.5: icmp_seq=2 ttl=63 time=1.497 ms
--- 192.168.10.5 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.731/1.041/1.497 ms
PS C:\Users\Administrator> ssh root@192.168.10.5
Command> ping 192.168.20.10
PING 192.168.20.10 (192.168.20.10) 56(84) bytes of data.
64 bytes from 192.168.20.10: icmp_seq=1 ttl=63 time=1.06 ms
64 bytes from 192.168.20.10: icmp_seq=2 ttl=63 time=1.19 ms
64 bytes from 192.168.20.10: icmp_seq=3 ttl=63 time=0.686 ms
64 bytes from 192.168.20.10: icmp_seq=4 ttl=63 time=0.833 ms
^C
--- 192.168.20.10 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 9ms
rtt min/avg/max/mdev = 0.686/0.942/1.194/0.196 ms
Command> ping 192.168.20.11
PING 192.168.20.11 (192.168.20.11) 56(84) bytes of data.
64 bytes from 192.168.20.11: icmp_seq=1 ttl=63 time=0.798 ms
64 bytes from 192.168.20.11: icmp_seq=2 ttl=63 time=1.11 ms
64 bytes from 192.168.20.11: icmp_seq=3 ttl=63 time=1.13 ms
64 bytes from 192.168.20.11: icmp_seq=4 ttl=63 time=0.689 ms
^C
--- 192.168.20.11 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 33ms
rtt min/avg/max/mdev = 0.689/0.931/1.129/0.195 ms
vCenter Log
2020-05-15T08:05:11.155Z info vpxd[15507] [Originator@6876 sub=HostCnx opID=CheckforMissingHeartbeats-74856499] [VpxdHostCnx] No heartbeats received from host; cnx: 5219c9b7-862a-17bf-2de1-2b471e0435a1, h: host-1029, time since last heartbeat: 2634158ms
2020-05-15T08:05:11.156Z info vpxd[15507] [Originator@6876 sub=HostCnx opID=CheckforMissingHeartbeats-74856499] [VpxdHostCnx] No heartbeats received from host; cnx: 52fdf7fe-5824-585f-d256-f62222ad4478, h: host-1026, time since last heartbeat: 2633936ms
2020-05-15T08:05:33.379Z info vpxd[32047] [Originator@6876 sub=HostGateway] CmConnectionFSM::RunFSM(ST_CM_CALL_FAILED)
2020-05-15T08:05:33.413Z warning vpxd[15514] [Originator@6876 sub=HTTP server] UnimplementedRequestHandler: HTTP method POST not supported for URI /. Request from 192.168.10.5.
2020-05-15T08:05:33.431Z error vpxd[32047] [Originator@6876 sub=HostGateway] [CisConnection]: ComponentManager->LoginByToken failed: HTTP error response: Bad Request
2020-05-15T08:05:33.431Z warning vpxd[32047] [Originator@6876 sub=HostGateway] State(ST_CM_LOGIN) failed with: HTTP error response: Bad Request
2020-05-15T08:05:33.569Z warning vpxd[14404] [Originator@6876 sub=HTTP server] UnimplementedRequestHandler: HTTP method POST not supported for URI /. Request from 192.168.10.5.
2020-05-15T08:05:33.571Z error vpxd[32047] [Originator@6876 sub=HostGateway] [CisConnection]: ComponentManager->LoginByToken failed: HTTP error response: Bad Request
2020-05-15T08:05:33.571Z warning vpxd[32047] [Originator@6876 sub=HostGateway] State(ST_CM_LOGIN) failed with: HTTP error response: Bad Request
2020-05-15T08:05:33.678Z warning vpxd[32043] [Originator@6876 sub=HTTP server] UnimplementedRequestHandler: HTTP method POST not supported for URI /. Request from 192.168.10.5.
2020-05-15T08:05:33.680Z error vpxd[32047] [Originator@6876 sub=HostGateway] [CisConnection]: ComponentManager->LoginByToken failed: HTTP error response: Bad Request
2020-05-15T08:05:33.680Z warning vpxd[32047] [Originator@6876 sub=HostGateway] State(ST_CM_LOGIN) failed with: HTTP error response: Bad Request
2020-05-15T08:05:33.795Z warning vpxd[32044] [Originator@6876 sub=HTTP server] UnimplementedRequestHandler: HTTP method POST not supported for URI /. Request from 192.168.10.5.
2020-05-15T08:05:33.797Z error vpxd[32047] [Originator@6876 sub=HostGateway] [CisConnection]: ComponentManager->LoginByToken failed: HTTP error response: Bad Request
2020-05-15T08:05:33.797Z warning vpxd[32047] [Originator@6876 sub=HostGateway] State(ST_CM_LOGIN) failed with: HTTP error response: Bad Request
2020-05-15T08:05:33.800Z warning vpxd[32047] [Originator@6876 sub=HostGateway] Ignoring exception during refresh of HostGateway cache: N7Vmacore4Http13HttpExceptionE(HTTP error response: Bad Request)
-->
2020-05-15T08:05:49.459Z info vpxd[14459] [Originator@6876 sub=Health] Wrote vpxd health XML to file /etc/vmware-sca/health/vmware-vpxd-health-status.xml. Status: YELLOW. Expiration: 5267
2020-05-15T08:05:51.918Z info vpxd[14446] [Originator@6876 sub=vpxLro opID=q-1952:h5ui-getProperties:urn:vmomi:HostSystem:host-1032:41c6ed99-ab4e-45ef-af71-e6993ff2ddda:1349277837:01-61] [VpxLRO] --
2020-05-15T08:06:23.037Z info vpxd[14767] [Originator@6876 sub=vpxLro opID=sps-Main-496359-494-437f-31] [VpxLRO] -- FINISH lro-9879
2020-05-15T08:06:26.069Z info vpxd[14436] [Originator@6876 sub=VapiEndpoint.HTTPService.HttpConnection] HTTP Connection read failed while waiting for further requests [N7Vmacore4Http14HttpConnectionE:0x00007f5004851010]: N7Vmacore16TimeoutExceptionE(Operation timed out: Stream: <io_obj p:0x000055917ab251c8, h:-1, <TCP '127.0.0.1 : 8093'>, <TCP '127.0.0.1 : 38974'> FD Closed>, duration: 00:00:45.995834 (hh:mm:ss.us))
--> [context]zKq7AVECAAAAAK9N4wAMdnB4ZAAAHHMubGlidm1hY29yZS5zbwAAh9QZAPvxGACHWxYAH1QYAM0DJQDqFiMAbL0iAOANIwBeWioBB3wAbGlicHRocmVhZC5zby4wAAIfKQ9saWJjLnNvLjYA[/context]
2020-05-15T08:06:26.069Z info vpxd[14527] [Originator@6876 sub=VapiEndpoint.HTTPService.HttpConnection] HTTP Connection read failed while waiting for further requests [N7Vmacore4Http14HttpConnectionE:0x00007f4ffc0a2670]: N7Vmacore16TimeoutExceptionE(Operation timed out: Stream: <io_obj p:0x00007f50043c2368, h:-1, <TCP '127.0.0.1 : 8093'>, <TCP '127.0.0.1 : 38998'> FD Closed>, duration: 00:00:45.971136 (hh:mm:ss.us))
--> [context]zKq7AVECAAAAAK9N4wAMdnB4ZAAAHHMubGlidm1hY29yZS5zbwAAh9QZAPvxGACHWxYAH1QYAM0DJQDqFiMAbL0iAOANIwBeWioBB3wAbGlicHRocmVhZC5zby4wAAIfKQ9saWJjLnNvLjYA[/context]
2020-05-15T08:06:27.136Z info vpxd[14384] [Originator@6876 sub=vpxLro opID=opId-c716a-2210-54] [VpxLRO] -- BEGIN lro-9881 -- SessionManager -- vim.SessionManager.sessionIsActive -- 52262bbd-e867-b55c-a353-211fcbd234e7(52469f4d-189b-1837-f76d-c689feac716a)
2020-05-15T08:06:27.137Z info vpxd[14384] [Originator@6876 sub=vpxLro opID=opId-c716a-2210-54] [VpxLRO] -- FINISH lro-9881
Thanks..
Moderator: Thread moved to the vSphere vNetwork area.
An update on this, I tried migrating on a fresh setup by having dual vmnics and 1 vmk, and surprisingly it migrated successfully.
Sadly it only did this once, its like vCenter does it randomly sometimes it works sometimes it doesn't, sometimes it works on one host and it doesn't on another, or am I not following the right way.
Can I know if there is a standard way to migrate from vSwitch to DVSwitch ?
Thanks..
This may help: How to Do the Old Switcheroo: Migrating vSS to vDS with Zero Downtime - VMware on VMware
Thanks scott28tt
the link was helpful in understanding the migration.
I finally came to find out the problem (no solution found yet).The issue is with one of the vmnics in each esxi host not migrating and causing the issue, why it is so I fail to understand. I came to know this by manually migrating each vmnic individually until the last vmnic3 which has the same status (unconfigured or attached to any switch) as vmnic2. Surprisingly vmnic2 migrated successfully while vmnic3 fails, not sure why.
This only happens when the deployment is done where the esxi hosts are in a different subnet than vCenter and a router is used as vCenter logs show a lot of missed heart beats. Where the esxi hosts and vCenter are in the same subnet everything works fine.
Any thoughts, thanks.