Skip navigation
2017
RajeevVCP4 Hot Shot
vExpert

VMotion failed

Posted by RajeevVCP4 Feb 22, 2017

Live example

If you got these backtrace in vpxd logs

 

migration to complete before retrying.

--> Failed to create a migrate heap of size 36297256: Not found

--> Failed to allocate migration heap.

--> Failed to initialize migration at source. Error 0xbad00a4. VMotion failed to start due to lack of cpu or memory resources

 

 

This issue occurred when multiple instances created by DRS for migrate virtual machines , that time resources locked

 

Only solution shutdown all running virtual machines and try to migrate , if it failed then reboot the host

Environment (Cisco UCS B200 M4 Blade Server) disconnected from vcenter server.

This is bug where we got such type of error on vmkernel log.


vmhba2

2017-02-17T14:49:04.019Z cpu20:996690)<7>fnic : 1 :: Abort Cmd called FCID 0xffffffff, LUN 0x4 TAG 3c6 flags 3

2017-02-17T14:49:04.019Z cpu20:996690)<7>fnic : 1 :: Returning from abort cmd type 0 FAILED

2017-02-17T14:49:04.019Z cpu20:996690)WARNING: LinScsi: SCSILinuxAbortCommands:1882: Failed, Driver fnic, for vmhba1

 

Check FNIC firmware if it should be Cisco recommended vHBA driver version 1.6.0.28

In this scenario first involve storage team for checking pool size from storage side , if its near to full then ask them reclaim space.

In my case vcenter server version was lower then ESXi build. Upgrade vcenter server also.

I upgrade it as vCenter Server 5.5 Update 3e Build 4180647.

 

Storage error :- VMAX Recovery team identified Pool 005 Prod-GP-R531 was 100% full. Errors "2A39" that logged for the devices means that "write failing as it can’t get space

GP Exception 13 in world 187416:vmm3:XXXXXXXX @ 0x418021d8adc7
xxxx-xx-xxTxx:xx:xx.xxxZ cpu16:187416)Backtrace for current CPU #16, worldID=187416, fp=0x0
xxxx-xx-xxTxx:xx:xx.xxxZ cpu16:187416)0x4393e0c1bf20:[0x418021d8adc7]VmAnon_AllocVmmPages@vmkernel#nover+0x3b stack: 0x100008785, 0x192d6c43b, 0x4393e84a7000, 0x6, 0x4393e84a7100
xxxx-xx-xxTxx:xx:xx.xxxZ cpu16:187416)0x4393e0c1bf80:[0x418021d18ad7]VMMVMKCall_Call@vmkernel#nover+0x157 stack: 0x4393e0c1bfec, 0x4024600000000, 0x418021d4a64b, 0xfffffffffc606d00, 0x0
xxxx-xx-xxTxx:xx:xx.xxxZ cpu16:187416)0x4393e0c1bfe0:[0x418021d4a6d2]VMKVMM_ArchEnterVMKernel@vmkernel#nover+0xe stack: 0x0, 0xfffffffffc4074b3, 0x0, 0x0, 0x0
xxxx-xx-xxTxx:xx:xx.xxxZ cpu16:187416)Panic: 623: Halting PCPU 16.
xxxx-xx-xxTxx:xx:xx.xxxZ cpu20:187418)Panic: 514: Panic from another CPU (cpu 20, world 187418): ip=0x418021d4a2e2 randomOff=0x21c00000:

 

If you are getting same entry in core dump then follow this solution

 

This issue is resolved in ESX 6.5.0a

 

Work around

 

To work around this issue for virtual machines migrated to an ESXi 6.5 host from previous version, recover the virtual machine from the ESXi 6.5 host and set Numa.FollowCoresPerSocket to 1 on all ESXi 6.5 hosts.

 

For further information refer this KB.

 

ESXi 6.5 host fails with PSOD: GP Exception 13 in multiple VMM world at VmAnon_AllocVmmPages (2147958) | VMware KB

 

In the Avamar Administrator interface a Virtual Machine is labeled with its VMX file location instead of

 

its VM name (000485609)

 

Avamar Client for VMware

 

 

Issue: The VM name is displayed incorrectly for random VMs on the Avamar GUI.

 

The VM name can be seen similar to:

 

"%2fvmfs%2fvolumes%2f5124e1c8-f4fd5839-7764-b8ac6f8a9883%2fpoi%2fpoi1.vmx"

 

In vpxd log file, you see many references to VmRemovedEvent and VMRenamedEvent:

 

grep "Invalid host event" vpxd-*.log | grep "vim.event.VmRenamedEvent"
grep "Invalid host event" vpxd-*.log | grep "vim.event.VmRemovedEvent"

Resolution

 

Currently, there is no resolution.

VMware is aware of this issue and is co-working with Backup solution teams to provide a solution.

To work around the issue, use these options:

  • Avoid vMotion for the backed up virtual machines.
  • Correct the backup set when necessary.
  • KB (2148378)

 

If you see this error when running esxcli command

 

Error interacting with configuration file /etc/vmware/lunTimestamps.log: Timeout while waiting for lock, /etc/vmware/lunTimestamps.log.LOCK, to be released. Another process has kept this file locked for more than 30 seconds. The process currently holding the lock is smartd(PID). This is likely a temporary condition. Please try your operation again


ESXi host showing as Disconnected in vCenter Server is a known issue in ESXi 6.0. 



Solution


his is a known issue affecting ESXi 6.0.  

Currently, there is no resolution.

 

To work around this issue, disable smartd daemon on the host.

 

To disable smartd daemon:

 

  1. Connect to the affected ESXi host with an SSH session.
  2. Run this command to stop the smartd service:

    /etc/init.d/smartd stop

  3. Run this command to stop the smartd service from starting on reboot:

    chkconfig smartd off

 



RajeevVCP4 Hot Shot
vExpert

Vmotion failed in vsphere

Posted by RajeevVCP4 Feb 12, 2017

This issue mostly occurred when user try v-motion by vc-client  in vmware vsphere6 environment there are two solution.

 

Try Vmotion/svmotion by using Web-Client If it failed then try this method but it is risky.

 

To resolve this issue, reset the CPUID Mask settings on the affected virtual machine.

 

To reset the CPUID Mask settings:

  1. Using the vSphere Client, connect to the vCenter Server and locate the affected virtual machine.
  2. Power off the virtual machine.
  3. Right-click the virtual machine and click Edit Settings > Options > CPUID Mask > Advanced.
  4. Click Reset All to Default to reset the CPUID Mask.
  5. Click OK > OK, then power on the virtual machine.

    The virtual machine now shows the correct EVC mode

If the issue persists, upgrade the virtual machine's virtual hardware to the latest version.