i have a big problem with my migration function on our ESX 5.1 Cluster. We have two ESX-Hosts which are connected by iSCSI and using two SAN. Also there are two switches, so we have an redundancy on all parts of the cluster. The virtual machines working on the esxhost-A and have their vmdk and so on stored on the SAN-A. A replication makes different replications to the second host (esxhost-B). Now i had to increase the RAM in both of the esx host without have a downtime. My idea was to shut down esxhost-B at first (there are no virtual machines working) and increase the RAM. Than start esxhost-B and migrate all machines from esxhost-A to esxhost-B. All works fine and fast. After that i increased the RAM in esxhost-A and started it up. Now i wanted to migrate all the machines back to esxhost-A. And this doesnt work
If i try to migrate a machine back to esxhost-A there pop up an error after 65% of migration:
I cant increase the switchover time because i have to shut down the VMs and for that and thats no possibility for me.
Can anybody help me out?
edit 21.08.2013: I added an attachement with a summary of the hostd.log. It is the time within the error occures.
edit 22.08.2013: added new attachement with the vmware.log Nobody can help me?
No ideas? My last try was to reboot both hosts, but nothing happens. Still working migration of VMs from "host A" to "host B" but no chance to migrate from "host B" to "host A". Aktually we have redundancy, becaus all of the VMs are working on "host A" and if there is a disaster, the VMs can migrate to "host B" but i cant believe, that there is no resolution. Is it possible to make only one Supportrequest? We dont want to buy Support for 1 year only to fix this problem
It seems your ESXi hosts has time configurations issues. Could please correct the time settings or if you have configured NTP on the Hosts, please make sure you it is set correctly.
For more info please refer to the KB link :VMware KB: vMotion fails at 63% with the error: The migration exceeded the maximum switchover time o...
I checked the NTP-Settings on both hosts. Its working fine, the deamon is working and the time is right. Ping to the NTP-Server works. Only the second step in the KB:1005092 "Troubleshooting NTP on ESX and ESXi" doesnt work. If i run the command: watch "ntpq -p ESX-host" for 30 seconds i got the error message: "
***Request timed out". I continued with step 4 and the tcp dump didn't show an output like the example in the KB, but i think this output isn't wrong or is it?
~ # tcpdump-uw -c 5 -n -i vmk0 host ntp-server and port 123
tcpdump-uw: verbose output suppressed, use -v or -vv for full protocol decode
listening on vmk0, link-type EN10MB (Ethernet), capture size 96 bytes
07:03:18.504522 IP 172.29.5.128.123 > 172.29.1.4.123: NTPv4, Client, length 48
07:03:18.504783 IP 172.29.1.4.123 > 172.29.5.128.123: NTPv4, Server, length 48
Aditional the NTP-Version of the NTP-Server is the following:
ntpd - NTP daemon program - Ver. 4.2.4p6
So anybody have an idea?
From your reply it seems NTP is not the issue. The particular VM has a high IO workload running in it ..?
To overcome the problem what you can do is that to modify the fsr.maxSwitchoverSeconds in the VM.
If you don't know how to do it please refer kb article : VMware KB: Using Storage vMotion to migrate a virtual machine with many disks timeout
or alternatively you can follow the steps mentioned in the link which has screenshots to do the same.
but that cant be the solution. Before i upgraded the hardware of both of the hosts there was no trouble to migrate between the hosts. After the hardware upgrade the problem occurred. I have over 30 VMs running on "host-A" which are fully in use of the hole company. It cant be the only solution to shut down every VM and change the value of fsr.maxSwitchoverSeconds. Respectively i hope so that it cant be because it would be very difficult and costly to do this.
You can configure advanced VM parameters on the fly without shutting down VMs through PowerCLI. The VMs just need a stun/unstun cycle (i.e. make a snapshot and remove it) for the parameters to become effective.
Refer to these articles for more information:
$vm = Read-Host 'Please provide VM name'
$Seconds = Read-Host 'Please provide fsr.maxSwitchoverSeconds value in seconds'
$snapname = 'Temp'
get-vm $vm | New-AdvancedSetting -Name fsr.maxSwitchoverSeconds -Value $Seconds -Confirm:$false
get-vm $vm | New-Snapshot -Name $snapname
get-vm $vm | Get-Snapshot -Name $snapname | Remove-Snapshot -Confirm:$false
Lots of my VM's are failing SvMotion at 96% with this same error message, "migration has exceeded the maximum switchover time", so I'd like to try the PowerCLI that Alisowski Doitt has posted here.
First question: Can anyone with some experience with New-AdvancedSetting reassure me that there won't be any scary unintended consequences when I create this new setting?
Second question: How high can I safely set the switchover time? Two hundred seconds? Three hundred?