VMware Cloud Community
jasmeetsinghsur
Enthusiast
Enthusiast

ESXi went unresponsive after vMotion.

Hi,

The host went unresponsive soon after vMotion. The NIC serving vMotion is the same as for management traffic. Below is the snipping from hostd logs on the affected host.

2019-11-06T13:17:46.821Z verbose hostd[12BC2B70] [Originator@6876 sub=PropertyProvider opID=70314485 user=vpxuser] RecordOp ASSIGN: info, haTask--vim.I
nternalStatsCollector.queryLatestVmStats-185809419. Applied change to temp map.
2019-11-06T13:18:06.825Z verbose hostd[12E81B70] [Originator@6876 sub=PropertyProvider opID=703144bd user=vpxuser] RecordOp ASSIGN: info, haTask--vim.I
nternalStatsCollector.queryLatestVmStats-185809437. Applied change to temp map.
2019-11-06T13:18:06.827Z verbose hostd[12740B70] [Originator@6876 sub=PropertyProvider opID=703144be user=vpxuser] RecordOp ASSIGN: info, haTask--vim.I
nternalStatsCollector.queryLatestVmStats-185809438. Applied change to temp map.
2019-11-06T13:18:06.828Z verbose hostd[12E81B70] [Originator@6876 sub=PropertyProvider opID=703144bf user=vpxuser] RecordOp ASSIGN: info, haTask--vim.I
nternalStatsCollector.queryLatestVmStats-185809439. Applied change to temp map.
2019-11-06T13:18:26.823Z verbose hostd[134C2B70] [Originator@6876 sub=PropertyProvider opID=703144e6 user=vpxuser] RecordOp ASSIGN: info, haTask--vim.I
nternalStatsCollector.queryLatestVmStats-185809447. Applied change to temp map.
2019-11-06T13:18:26.825Z verbose hostd[12B40B70] [Originator@6876 sub=PropertyProvider opID=703144e7 user=vpxuser] RecordOp ASSIGN: info, haTask--vim.I
nternalStatsCollector.queryLatestVmStats-185809448. Applied change to temp map.
2019-11-06T13:18:46.824Z verbose hostd[12B40B70] [Originator@6876 sub=PropertyProvider opID=7031450b user=vpxuser] RecordOp ASSIGN: info, haTask--vim.I
nternalStatsCollector.queryLatestVmStats-185809453. Applied change to temp map.
2019-11-06T13:18:46.826Z verbose hostd[134C2B70] [Originator@6876 sub=PropertyProvider opID=7031450c user=vpxuser] RecordOp ASSIGN: info, haTask--vim.I
nternalStatsCollector.queryLatestVmStats-185809454. Applied change to temp map.
2019-11-06T13:19:06.827Z verbose hostd[12740B70] [Originator@6876 sub=PropertyProvider opID=70314536 user=vpxuser] RecordOp ASSIGN: info, haTask--vim.InternalStatsCollector.queryLatestVmStats-185809468. Applied change to temp map.
2019-11-06T13:19:06.829Z verbose hostd[134C2B70] [Originator@6876 sub=PropertyProvider opID=70314537 user=vpxuser] RecordOp ASSIGN: info, haTask--vim.InternalStatsCollector.queryLatestVmStats-185809469. Applied change to temp map.
2019-11-06T13:19:26.829Z verbose hostd[12B40B70] [Originator@6876 sub=PropertyProvider opID=70314549 user=vpxuser] RecordOp ASSIGN: info, haTask--vim.InternalStatsCollector.queryLatestVmStats-185809476. Applied change to temp map.
2019-11-06T13:19:26.831Z verbose hostd[12B40B70] [Originator@6876 sub=PropertyProvider opID=7031454a user=vpxuser] RecordOp ASSIGN: info, haTask--vim.InternalStatsCollector.queryLatestVmStats-185809477. Applied change to temp map.
2019-11-06T13:19:46.829Z verbose hostd[130C2B70] [Originator@6876 sub=PropertyProvider opID=70314557 user=vpxuser] RecordOp ASSIGN: info, haTask--vim.InternalStatsCollector.queryLatestVmStats-185809482. Applied change to temp map.
2019-11-06T13:19:46.832Z verbose hostd[134C2B70] [Originator@6876 sub=PropertyProvider opID=70314558 user=vpxuser] RecordOp ASSIGN: info, haTask--vim.InternalStatsCollector.queryLatestVmStats-185809483. Applied change to temp map.
2019-11-06T13:20:06.827Z verbose hostd[13440B70] [Originator@6876 sub=PropertyProvider opID=703145ae user=vpxuser] RecordOp ASSIGN: info, haTask--vim.InternalStatsCollector.queryLatestVmStats-185809506. Applied change to temp map.

0 Kudos
3 Replies
a_p_
Leadership
Leadership

The host went unresponsive soon after vMotion.

Did this happen after vMotion finished, or after starting vMotion?

Please remember that vMotion traffic can easily satisfy a NIC, and should therefore be on its own NIC, or used on a Distributed Switch, where Network I/O control takes care of bandwith useage.

André

0 Kudos
Maio312
Contributor
Contributor

have u run the esxtop , while the issue occurs

nishant
0 Kudos
jasmeetsinghsur
Enthusiast
Enthusiast

Getting unresponsive after vMotion. vmk0 is serving both the management & vMotion traffic. It is only after the successfully migration of a virtual machine through vMotion, the host is getting unresponsive. NIC state does not change. It is up but ICMP drop are observed.

Can queries from monitoring tool be the reason for this to occur?

Expected time stamp of failure is 2019-11-06T13:19

Snipping from hostd logs.

2019-11-06T13:19:06.827Z verbose hostd[12740B70] [Originator@6876 sub=PropertyProvider opID=70314536 user=vpxuser] RecordOp ASSIGN: info, haTask--vim.InternalStatsCollector.queryLatestVmStats-185809468. Applied change to temp map.
2019-11-06T13:19:06.829Z verbose hostd[134C2B70] [Originator@6876 sub=PropertyProvider opID=70314537 user=vpxuser] RecordOp ASSIGN: info, haTask--vim.InternalStatsCollector.queryLatestVmStats-185809469. Applied change to temp map.
2019-11-06T13:19:26.829Z verbose hostd[12B40B70] [Originator@6876 sub=PropertyProvider opID=70314549 user=vpxuser] RecordOp ASSIGN: info, haTask--vim.InternalStatsCollector.queryLatestVmStats-185809476. Applied change to temp map.
2019-11-06T13:19:26.831Z verbose hostd[12B40B70] [Originator@6876 sub=PropertyProvider opID=7031454a user=vpxuser] RecordOp ASSIGN: info, haTask--vim.InternalStatsCollector.queryLatestVmStats-185809477. Applied change to temp map.
2019-11-06T13:19:46.829Z verbose hostd[130C2B70] [Originator@6876 sub=PropertyProvider opID=70314557 user=vpxuser] RecordOp ASSIGN: info, haTask--vim.InternalStatsCollector.queryLatestVmStats-185809482. Applied change to temp map.
2019-11-06T13:19:46.832Z verbose hostd[134C2B70] [Originator@6876 sub=PropertyProvider opID=70314558 user=vpxuser] RecordOp ASSIGN: info, haTask--vim.InternalStatsCollector.queryLatestVmStats-185809483. Applied change to temp map.

Action taken:

> Upgrade the I/o device drivers on the host to latest as per VMware HCL. Still the same.

> Heartbeat communication on UDP  port 902 is working normal. vpxd configuration is correct.

Advice by Hardware vendor to execute the NetQ commandset on the host & observe as below

“esxcfg-module -s enable_default_queue_filters=0 qfle3”
"esxcli system settings kernel set --setting="netNetqueueEnabled" --value="FALSE"

Advice by VMware to upgrade the I/o device driver to the latest as per HCL and then capture packets during vMotion.

timestamp of failure is 2019-11-06T13:19

0 Kudos