VMware Cloud Community
MtheG92
Enthusiast
Enthusiast
Jump to solution

No vMotion log entries

Hi folks

Usually when I troubleshoot vMotion issues I check following logs on both participating ESXi hosts:

/var/log/vpxa.log

/var/log/hostd.log

/var/log/vmkernel.log

And sometimes when needed also the vCenter log:

/var/log/vmware/vpxd/vpxd.log

And, the VM related log file: vmware.log

Today I wanted to troubleshoot a very slow cold xvMotion situation in an environment where the migration goes from a vSphere 6.7 U3 vSAN enabled cluster to a vSphere 6.5 U2 SAN backed Storage Array cluster with a very low bandwidth performance. The process succeeds but takes to much time to complete (that’s why I troubleshoot Smiley Happy).

The only problem I have is, that there is not only one log entry regarding to the running vMotion process found in the initially described log files nor in the archived (.gz) log files.

I checked the host log level which was set to its default (Info), then I changed it to verbose, restarted the hostd service and started another vMotion process and got still the same problem - no vMotion related log entries available. The hosts are running fine and other log entries are created but nothing regarding the vMotion process.

Do I miss something? Had anyone the same issue/behavior? Is this related to the cold xvMotion?

Kind regards,

MtheG

Tags (3)
Reply
0 Kudos
1 Solution

Accepted Solutions
MtheG92
Enthusiast
Enthusiast
Jump to solution

I played a little bit around and analyzed a ton of log files and got some understanding on this observed behavior.

First of all, we need to clarify that a cold xvMotion process is not a hot xvMotion process! There is a “little-big” difference under the hood. Cold xvMotion uses the Network File Service (NFC) where Hot xvMotion uses the vMotion protocol.

As stated in the VMware documentation:

Cold Migration

Moving a powered off or suspended virtual machine to a new host. Optionally, you can relocate configuration and disk files for powered off or suspended virtual machines to new storage locations. You can also use cold migration to move virtual machines from one virtual switch to another, and from one data center to another. You can perform cold migration manually or you can schedule a task.

Hot Migration

Moving a powered on virtual machine to a new host. Optionally, you can also move the virtual machine disks or folder to a different datastore. Hot migration is also called live migration or vMotion. With vMotion, you migrate the virtual machine without any interruption in its availability.

From a network stack perspective, if a cold migration is triggered, the traffic goes via the Management service enabled interface (if no Provisioning TcpIp stack is configured). That should not be a big deal if you use a 10GbE or faster network on the Management subnet.

So, I started to look out for some NFC related logs and found something in vpxa.log:

info vpxa[2100002] [Originator@6876 sub=vpxNfcServer] PROXY connection to NFC: found session ticket:

info vpxa[2100426] [Originator@6876 sub=vpxLro opID=lro-19853-1bb016c2] [VpxLRO] -- BEGIN lro-19853 --  -- VpxNfcServerLro --

info vpxa[2100426] [Originator@6876 sub=Libs opID=lro-19853-1bb016c2] Optimize socket buffer size.

info vpxa[2100426] [Originator@6876 sub=Libs opID=lro-19853-1bb016c2] NfcNet_OptimizeSocketBuffer: Setting buffer size using RTT=0 ms BANDWIDTH=134000 bytes/ms BUFFER_SIZE=40KB

info vpxa[2100426] [Originator@6876 sub=vpxNfcServer opID=lro-19853-1bb016c2] Successfully initialized nfc callback for a write to the socket to be invoked on a separate thread

info vpxa[2100426] [Originator@6876 sub=vpxNfcServer opID=lro-19853-1bb016c2] Plugin started

info vpxa[2100426] [Originator@6876 sub=vpxLro opID=lro-19853-1bb016c2] [VpxLRO] -- FINISH lro-19853

--

info vpxa[2100000] [Originator@6876 sub=vpxLro opID=kg6l9mc7-3894059-auto-2bgoc-h5:70248532-f8-01-ea] [VpxLRO] -- BEGIN task-4279 -- nfcManager -- nfc.NfcManager.copy -- 52e1dd90-fb4c-1216-83ac-cd8068dd55f1

info vpxa[2100000] [Originator@6876 sub=Libs opID=kg6l9mc7-3894059-auto-2bgoc-h5:70248532-f8-01-ea] Optimize socket buffer size.

info vpxa[2100000] [Originator@6876 sub=Libs opID=kg6l9mc7-3894059-auto-2bgoc-h5:70248532-f8-01-ea] NfcNet_OptimizeSocketBuffer: Setting buffer size using RTT=11 ms BANDWIDTH=134000 bytes/ms BUFFER_SIZE=1512KB

info vpxa[2100000] [Originator@6876 sub=NfcManager opID=kg6l9mc7-3894059-auto-2bgoc-h5:70248532-f8-01-ea] Pre-processing copy spec {source-ip}:ds:///vmfs/volumes/5f7850e9-d90af766-a301-92a9fca0002c/{vm-folder}.vmdk -> {destination-ip}:ds:///vmfs/volumes/vsan:52dbd9c07edeecfa-24e791177ca19853/d1d48f5f-a820-a6a9-5cc8-6a2167900002/{vm-folder}.vmdk

info vpxa[2100000] [Originator@6876 sub=NfcManager opID=kg6l9mc7-3894059-auto-2bgoc-h5:70248532-f8-01-ea] Copy operation completed successfully

info vpxa[2100000] [Originator@6876 sub=vpxLro opID=kg6l9mc7-3894059-auto-2bgoc-h5:70248532-f8-01-ea] [VpxLRO] -- FINISH task-4279

What a wonder, there are some log entries!

Kind regards,

MtheG

View solution in original post

Reply
0 Kudos
3 Replies
ashilkrishnan
VMware Employee
VMware Employee
Jump to solution

Hi

Do we notice recent logging happening in these logs or are they outdated ?

Does it log events if you do a regular vMotion(Host to host) to another ESXi host in the cluster? Please try to migrate this VM to another host and check if it logs events during storage vMotion

Reply
0 Kudos
MtheG92
Enthusiast
Enthusiast
Jump to solution

Hi ashilkrishnan

Do we notice recent logging happening in these logs or are they outdated ?

Yes, there are recent log entries.

Does it log events if you do a regular vMotion(Host to host) to another ESXi host in the cluster?

Not realy, only a few hostd entries about change state from the affected VM.

Please try to migrate this VM to another host and check if it logs events during storage vMotion

Done, same behavior.

Normally I would expect something like this:

2020-10-16T11:56:12.912Z cpu39:3479990)Migrate: vm 3479991: 3885: Setting VMOTION info: Source ts = 32100428654216515, src ip = <aaa.aaa.aaa.aaa> dest ip = <bbb.bbb.bbb.bbb> Dest wid = 3474032 using SHARED swap, encrypted

2020-10-16T11:56:12.913Z cpu28:3480948)MigrateNet: 1751: 32100428654216515 S: Successfully bound connection to vmknic vmk1 - 'aaa.aaa.aaa.aaa'

2020-10-16T11:56:12.913Z cpu28:3480948)MigrateNet: 1751: 32100428654216515 S: Successfully bound connection to vmknic vmk1 - 'aaa.aaa.aaa.aaa'

2020-10-16T11:56:12.913Z cpu28:3480948)VMotionUtil: 5199: 32100428654216515 S: Stream connection 1 added.

2020-10-16T11:56:26.111Z cpu2:3479991)VMotion: 5367: 32100428654216515 S: Another pre-copy iteration needed with 176070 pages left to send (prev2 8388608, prev 8388608, pages dirtied by pass through device 0, network bandwidth ~1229.474 MB/s, 13478% t2d)

2020-10-16T11:56:26.599Z cpu2:3479991)VMotion: 5277: 32100428654216515 S: Stopping pre-copy: only 13404 pages left to send, which can be sent within the switchover time goal of 0.500 seconds (network bandwidth ~1441.895 MB/s, 1262% t2d)

2020-10-16T11:56:27.029Z cpu25:3480948)VMotionSend: 5095: 32100428654216515 S: Sent all modified pages to destination (network bandwidth ~1425.207 MB/s)

But that's not the case - why?

Are these entries only available during a hot vMotion? Because I am running cold xvMotion at the moment?

Kind regards,

MtheG

Reply
0 Kudos
MtheG92
Enthusiast
Enthusiast
Jump to solution

I played a little bit around and analyzed a ton of log files and got some understanding on this observed behavior.

First of all, we need to clarify that a cold xvMotion process is not a hot xvMotion process! There is a “little-big” difference under the hood. Cold xvMotion uses the Network File Service (NFC) where Hot xvMotion uses the vMotion protocol.

As stated in the VMware documentation:

Cold Migration

Moving a powered off or suspended virtual machine to a new host. Optionally, you can relocate configuration and disk files for powered off or suspended virtual machines to new storage locations. You can also use cold migration to move virtual machines from one virtual switch to another, and from one data center to another. You can perform cold migration manually or you can schedule a task.

Hot Migration

Moving a powered on virtual machine to a new host. Optionally, you can also move the virtual machine disks or folder to a different datastore. Hot migration is also called live migration or vMotion. With vMotion, you migrate the virtual machine without any interruption in its availability.

From a network stack perspective, if a cold migration is triggered, the traffic goes via the Management service enabled interface (if no Provisioning TcpIp stack is configured). That should not be a big deal if you use a 10GbE or faster network on the Management subnet.

So, I started to look out for some NFC related logs and found something in vpxa.log:

info vpxa[2100002] [Originator@6876 sub=vpxNfcServer] PROXY connection to NFC: found session ticket:

info vpxa[2100426] [Originator@6876 sub=vpxLro opID=lro-19853-1bb016c2] [VpxLRO] -- BEGIN lro-19853 --  -- VpxNfcServerLro --

info vpxa[2100426] [Originator@6876 sub=Libs opID=lro-19853-1bb016c2] Optimize socket buffer size.

info vpxa[2100426] [Originator@6876 sub=Libs opID=lro-19853-1bb016c2] NfcNet_OptimizeSocketBuffer: Setting buffer size using RTT=0 ms BANDWIDTH=134000 bytes/ms BUFFER_SIZE=40KB

info vpxa[2100426] [Originator@6876 sub=vpxNfcServer opID=lro-19853-1bb016c2] Successfully initialized nfc callback for a write to the socket to be invoked on a separate thread

info vpxa[2100426] [Originator@6876 sub=vpxNfcServer opID=lro-19853-1bb016c2] Plugin started

info vpxa[2100426] [Originator@6876 sub=vpxLro opID=lro-19853-1bb016c2] [VpxLRO] -- FINISH lro-19853

--

info vpxa[2100000] [Originator@6876 sub=vpxLro opID=kg6l9mc7-3894059-auto-2bgoc-h5:70248532-f8-01-ea] [VpxLRO] -- BEGIN task-4279 -- nfcManager -- nfc.NfcManager.copy -- 52e1dd90-fb4c-1216-83ac-cd8068dd55f1

info vpxa[2100000] [Originator@6876 sub=Libs opID=kg6l9mc7-3894059-auto-2bgoc-h5:70248532-f8-01-ea] Optimize socket buffer size.

info vpxa[2100000] [Originator@6876 sub=Libs opID=kg6l9mc7-3894059-auto-2bgoc-h5:70248532-f8-01-ea] NfcNet_OptimizeSocketBuffer: Setting buffer size using RTT=11 ms BANDWIDTH=134000 bytes/ms BUFFER_SIZE=1512KB

info vpxa[2100000] [Originator@6876 sub=NfcManager opID=kg6l9mc7-3894059-auto-2bgoc-h5:70248532-f8-01-ea] Pre-processing copy spec {source-ip}:ds:///vmfs/volumes/5f7850e9-d90af766-a301-92a9fca0002c/{vm-folder}.vmdk -> {destination-ip}:ds:///vmfs/volumes/vsan:52dbd9c07edeecfa-24e791177ca19853/d1d48f5f-a820-a6a9-5cc8-6a2167900002/{vm-folder}.vmdk

info vpxa[2100000] [Originator@6876 sub=NfcManager opID=kg6l9mc7-3894059-auto-2bgoc-h5:70248532-f8-01-ea] Copy operation completed successfully

info vpxa[2100000] [Originator@6876 sub=vpxLro opID=kg6l9mc7-3894059-auto-2bgoc-h5:70248532-f8-01-ea] [VpxLRO] -- FINISH task-4279

What a wonder, there are some log entries!

Kind regards,

MtheG

Reply
0 Kudos