saravana_k_r
Contributor
Contributor

VMotion not working

Jump to solution

I ahve been using VMotion in the past moving the same virtual machine to be moved from server1 to server2 . now also i moved the same virtual machine from server1 to server2. But when it comes to 10% it stops and gives the error"A general system error occured: Invalidfault".Now i am not able to vmotion the virtual machine. Can u say me what may be the problem?'

Thank you

0 Kudos
1 Solution

Accepted Solutions
dave01
Enthusiast
Enthusiast

The first step is to vmkping from the host with the VM to the host you wish to vmotion to. Then try vmkping back the other way, make sure you are pinging the VMKERNEL address not the service console address of the ESX host at the other end.

If you have multiple vmkernel addresses (like when you are running iscsi) make sure u ping to the one that is ticked vmotion.

If the first ping response fails then the other replies work, this is you extenal switches upgrading their arp tables for that ip/mac for the first time, then the vmotion will probably work

This tends to happen on more complex networks

If there is no ping response then you know there is a more serious network issue

View solution in original post

0 Kudos
15 Replies
admin
Immortal
Immortal

This is happening for all VM's or just one?

Do cold migrations work?

Can the servers vmkping each other's vmkernel IP address?

From the ESX server command line, type: vmkping ipaddress

kjb007
Immortal
Immortal

I agree with appk, usually when vmotion stops at 10%, you have a network issue. Run vmkping against your hosts and make sure resolution is ok.

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
saravana_k_r
Contributor
Contributor

I could vmkping the ESX hosts from other ESX hosts but still the vmotion stops at 10%. What is the problem?

0 Kudos
dave01
Enthusiast
Enthusiast

The first step is to vmkping from the host with the VM to the host you wish to vmotion to. Then try vmkping back the other way, make sure you are pinging the VMKERNEL address not the service console address of the ESX host at the other end.

If you have multiple vmkernel addresses (like when you are running iscsi) make sure u ping to the one that is ticked vmotion.

If the first ping response fails then the other replies work, this is you extenal switches upgrading their arp tables for that ip/mac for the first time, then the vmotion will probably work

This tends to happen on more complex networks

If there is no ping response then you know there is a more serious network issue

View solution in original post

0 Kudos
saravana_k_r
Contributor
Contributor

thanks now the vmotion is working properly

0 Kudos
arunram84
Contributor
Contributor

Hi Dave

I am also facing a similar problem, where my vmotion gets stuck at 10% and after 10-15 mins an error is thrown up - A general system error occurred: Invalid fault.

Cold migration from esx -1 to esx -2 works on the shared storage. but vmotion gets stuck.

You have said-

If the first ping response fails then the other replies work, this is you extenal switches upgrading their arp tables for that ip/mac for the first time, then the vmotion will probably workIf the first ping response fails then the other replies work, this is you extenal switches upgrading their arp tables for that ip/mac for the first time, then the vmotion will probably work

Can you elaborate this please. I am able to vmkping from esx-1 to esx-2 's vmkernel and vice versa.

I am using iscsi storage as shared storage.

0 Kudos
dave01
Enthusiast
Enthusiast

The vmkping utility is a tool for testing the networking between the vmkernel interfaces on each esx host.

In a nutshell the vmotion traffic (copying the contents of the VM's ram etc) is sent thru the VMkernel port group/nic and you have to have this vmkernel enabled for vmotion.

This is seperate to the service console of the ESX host

What was the ping response times between the vmkernel IP's?

Also do you have the LUN ID for each VMFS volume set correctly?

0 Kudos
arunram84
Contributor
Contributor

I have created the vmkernel port and vmotion is enabled in it.

The response time for the vmkping are:

Esx-1 to vmkernel ip of esx-2

# vmkping 172.31.148.109

PING 172.31.148.109 (172.31.148.109): 56 data bytes

64 bytes from 172.31.148.109: icmp_seq=0 ttl=64 time=0.150 ms

64 bytes from 172.31.148.109: icmp_seq=1 ttl=64 time=0.127 ms

64 bytes from 172.31.148.109: icmp_seq=2 ttl=64 time=0.134 ms

Esx-2 to vmkernel ip of esx-1

# vmkping 172.31.148.123

PING 172.31.148.123 (172.31.148.123): 56 data bytes

64 bytes from 172.31.148.123: icmp_seq=0 ttl=64 time=0.149 ms

64 bytes from 172.31.148.123: icmp_seq=1 ttl=64 time=0.134 ms

64 bytes from 172.31.148.123: icmp_seq=2 ttl=64 time=0.130 ms

Regarding the lun id, the current lun ids are 0. And I did not see any option to edit the lun ids. How can we do this.

0 Kudos
kjb007
Immortal
Immortal

vmotion failing at 10% is usually indicative of some kind of network failure. If your vmkping works, then your networking appears to be in place. DNS issues can also cause issues. Do you have all your hostnames, both short form and fully qualified domain names able to be resolved by all servers in which you are trying to vmotion? If you're not using DNS, then you have to make sure all the servers are using /etc/hosts entries for all servers. Other than that, make sure you don't have vm's on host-only networks, and that you don't have cd's or floppies attached to the vm you are trying to vmotion.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
arunram84
Contributor
Contributor

Hi,

We are not using DNS but please let me know what you meant by this

If you're not using DNS, then you have to make sure all the servers are using /etc/hosts entries for all servers. Other than that, make sure you don't have vm's on host-only networks,

Currently on both the ESX server the o/p of vi etc/hosts is

# vi /etc/hosts

  1. Do not remove the following line, or various programs

  2. that require network functionality will fail.

127.0.0.1 localhost.localdomain localhost

0 Kudos
kjb007
Immortal
Immortal

For every host that is part of your ESX infratructure, you will need an entry in your /etc/hosts file similar to this:

x.x.x.1 hostname1 hostname1.domainname.com

x.x.x.2 hostname2 hostname2.domainname.com

Then, copy that to all of your hosts and make the appropriate entry in your windows virtual center server's host file (c:\windows\system32\drivers\etc\hosts), the syntax there will be the same.

Once that is done, make sure all of your hosts can ping each other using the hostname1 and hostname1.domainname.com format along with the IP addresses. Your vmotion should then proceed successfully.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
dave01
Enthusiast
Enthusiast

This is sound advise as ESX relies heavily on DNS to resolve the other hosts etc

The other option is to install DNS server on your windows managemnt/VC server, manually create the DNS records and get the ESX hosts talking to that server for DNS, this way you only have to maintain the hostname/ip list on the one location

0 Kudos
arunram84
Contributor
Contributor

Thanks for the advice but even after creating a dns server and updating the dns ips the same problem persists. Just was wondering whether the problem is because of a dropping of the iscsi volume connection on one of the esx servers. The iscsi connection shows as dropped on one of the esx server and is shown as established on the other.

the output of vmkiscsi-ls is this on esx-2.

SESSION STATUS : DROPPED AT Mon May 12 17:35:45 2008

and this on esx-1

SESSION STATUS : ESTABLISHED AT Mon May 12 17:37:47 2008

0 Kudos
kjb007
Immortal
Immortal

All of the previous troubleshooting assumes that the storage is stable in all locations. You should check and make sure your iscsi config allows all your hosts access to the same LUN. vmotion will verify storage in your current and new location, and if it can not verify that, will fail the vmotion.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
mcorey3
Contributor
Contributor

I had this same issue occur on a recently P-to-V conversion via VM Converter, the VM would not VMotion between hosts. After VMWare support looked through my host logs they saw that one of the 4 VMDKs attached to this VM had a wrong SCSI bus setting.

I ended up cloning the faulty "original" VM and tested the VMotioning on the "clone" and it worked. I then deleted the faulty "original" VM and recreated it using the "clone" and VMotioning was back operational for this VM.

Not sure why the SCSI bus setting got screwy during the conversion, nor why it only was set on the one disk and not the other 3. But this work around worked, unfortunately it did require the VM being down for about 2 hours.

0 Kudos