Hi
I'm getting a issue where VM's are failing to vmotion at 10% i've had a look thru the network settingi and all seems ok I think it's more a storage issues. I'mgetting the following errors in the vmkernal.log
Jun 25 09:16:12 MSC05ESX-VMSC vmkernel: 115:18:46:05.062 cpu0:1168)StorageMonitor: 196: vmhba1:1:0:0 status = 24/0 0x0 0x0 0x0
Jun 25 09:17:13 MSC05ESX-VMSC vmkernel: 115:18:47:05.736 cpu0:1024)StorageMonitor: 196: vmhba1:1:0:0 status = 24/0 0x0 0x0 0x0
Jun 25 09:17:13 MSC05ESX-VMSC vmkernel: 115:18:47:05.762 cpu0:1024)StorageMonitor: 196: vmhba1:1:0:0 status = 24/0 0x0 0x0 0x0
Jun 25 10:22:24 MSC05ESX-VMSC vmkernel: 115:19:52:16.434 cpu5:1041)Config: 416: "HostLocalSwapDirEnabled" = 0, Old Value: 0, (Status: 0x0)
Jun 25 10:33:13 MSC05ESX-VMSC vmkernel: 115:20:03:05.565 cpu0:1228)StorageMonitor: 196: vmhba1:1:0:0 status = 24/0 0x0 0x0 0x0
Jun 25 10:35:50 MSC05ESX-VMSC vmkernel: 115:20:05:43.018 cpu0:1024)StorageMonitor: 196: vmhba1:1:0:0 status = 24/0 0x0 0x0 0x0
Any idea on what the issue might be ?
I do not think it is a storage problem since the virtual disks stay where they are at when vmotioning - typically when vmotion fails at 10% it is a communication problem between the vmkernel nics that are enabled for vmotion? Are your servers ESX or ESXi? If they are ESX go to the command line and issues the vmkping command and see if your can vmkping the vmkernel on the othe ESX host? Insure that the vmkernel ports are on the same subnet-
If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
here's a KB article worth checking out
How's your hardware? The source and destination ESX hosts need to have a number of things that are compatible, include CPU Makes and Families (to a certain degree).
Thanks
Lofty
PS. If you found my response helpful, think about awarding some points.
24/0 - indicates reservation conflict on the VMfs lun - make sure the LUN is not hard locked (my won term for SCSI-3 reservation on the storage array) - you may wish to get the SAN(I assume its SAN) guys
if there is no persistent lock on the storage array - then you may have a bigger(by which will take much more time to put a long term solution)
ask the san guys as to what hosts are conflicting in an attempt toreserv the lun - check /var/log/vmkernel on the other hostsin the farm - if its like unable to poweron - no swap file thing - then its an HA problem - reply back - i will walk you through further
I hope this helps
cheers!
Thanks for your updates. I am able to vmkping between the 2 of them no worries. There is only 2 hosts int he cluster and can ping to and from both of them.
I think there is a reservation on the storage side as one of you have said. I'm unable to find wich host is causing it. I'm pretty sure it was working a whilst back when i first built the cluster. I hadn't done anything with the cluster for a while and now i'm trying to upgrade it to U4. I've got a number of other clsuters aroudn the world working fine.
Thanks
I've had this in the past and the issue turned out to be ACLs preventing ICMP ping either to the ESX servers default gateway...
Thanks, I just confirmed i'm able to ping the default gateway on both boxes.
Hi - were you able to find the host holding the locks?
Also, check the SC is not running out of memory.
-Thanks.
Also check permisson on SAN, we had similar issues and was permission related.
I'm not sure how to determine what is holding the reservations on the LUN. there is only 1 VMFS volue presented to the cluster at the moment.
Here is some log infoo from the vmkernal on the other host
tail -f /var/log/vmkernel
Jun 26 10:25:21 MSC04ESX-VMSC vmkernel: 2:20:10:56.143 cpu4:1512)Sched: vm 1513: 5366: moved group 198 to be under group 17
Jun 26 10:25:21 MSC04ESX-VMSC vmkernel: 2:20:10:56.158 cpu4:1512)Swap: vm 1513: 2169: extending swap to 1048576 KB
Jun 26 10:25:21 MSC04ESX-VMSC vmkernel: 2:20:10:56.535 cpu5:1512)Migrate: vm 1513: 7338: Setting migration info ts = 1245975808481713, src ip = <10.11.97.12> dest ip = <0.0.0.0> Dest wid = -1 using SHARED swap
Jun 26 10:25:21 MSC04ESX-VMSC vmkernel: 2:20:10:56.535 cpu5:1512)World: vm 1514: 900: Starting world migSendHelper-1513 with flags 1
Jun 26 10:25:21 MSC04ESX-VMSC vmkernel: 2:20:10:56.535 cpu5:1512)World: vm 1515: 900: Starting world migRecvHelper-1513 with flags 1
Jun 26 10:26:21 MSC04ESX-VMSC vmkernel: 2:20:11:56.547 cpu6:1512)WARNING: Migrate: 1346: 1245975808481713: Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error.
Jun 26 10:26:21 MSC04ESX-VMSC vmkernel: 2:20:11:56.547 cpu6:1512)WARNING: Migrate: 1243: 1245975808481713: Failed: Migration determined a failure by the VMX (0xbad0091) @0xa048e5
Jun 26 10:26:21 MSC04ESX-VMSC vmkernel: 2:20:11:56.548 cpu6:1512)Sched: vm 1513: 1031: name='vmm0:MSC01AAC'
Jun 26 10:26:21 MSC04ESX-VMSC vmkernel: 2:20:11:56.548 cpu6:1512)CpuSched: vm 1513: 13864: zombified unscheduled world: runState=NEW
Jun 26 10:26:21 MSC04ESX-VMSC vmkernel: 2:20:11:56.548 cpu6:1512)World: vm 1513: 2488: deathPending set; world not running, scheduling reap
interesting dest ip 0.0.0.0, is it expected?
Jun 26 10:25:21 MSC04ESX-VMSC vmkernel: 2:20:10:56.535
cpu5:1512)Migrate: vm 1513: 7338: Setting migration info ts =
1245975808481713, src ip = <10.11.97.12> *dest ip =
<0.0.0.0>* Dest wid = -1 using SHARED swap
Can you try a cold migration. it should work.
Let us know the results.
Do you receive any errors in the VI Client? How loaded are your hosts? I have also seen this with an improperly set reservation on a resource pool but you get a not enough resources error in the VI Client
If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
I have done a cold migrations and they worked fine to and from both hsots. The current load on the cluster is only about 15-20%. The only error I get in the vicleint is that it times out after about 5 or 10 minutes.
Can I ask you to post up the hardware specifications of the hosts?
Specifically each hosts CPU make/model/type etc
Thanks
Lofty
PS. If you found my response helpful, think about awarding some points.
What happens if you vmotion the other way - same error message?
If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
Have you got multiple datastores.... can you move the machine to a different datastore in the cluster, fire it up on the same host and then do the migration again(vmotion)
If you have HA in place ... are you hosts added with IP addresses or FQDNs?
Try this and lemme know...
Hope this helps.
Are you running it with high priority/reserve cpu for optimal VMotion performance or low priority/perform with available resources? As possible your limit on the resource pool either the host your migrating from or too cant reserve that amount of processing to allow the migration? You said it works cold migrating so it sounds like a resourcing allocation problem to me. Maybe try run it with available resources option. Also maybe try migrating it to the top level of the esx host instead of putting it into a resource pool as this may be limiting you especially if there is a large amount of machine in the destination location.
If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
If you found these or other information useful, please consider awarding points for "Correct" or "Helpful".
Gregg
Hey GUys,
Thanks for all your responses, i've logged a case with vmware they have advised for some reason it is back tracing.
AdapterServer caught unexpected exception: Invalid state
I have confirmed the licence is ok and i can see the traffic flwoing thru the firewall. If i keep trying i'm able to vmotion the boxes how ever it takes a few goes at it.
Thanks