I have a vm that BSOD after vmotion from esx4.0 to esxi4.1. I have vmotioned 150+ other vm's just fine and this one is giving me trouble. When I put it back on the 4.0 host it boots back up normal.
Any suggestions?
Thanks
Regarding Andre's comment about CPUID masks, that's a great place to investigate.
A) is your cluster running an EVC baseline?
B) If your VM has a CPUID mask defined Edit > Options > CPUID Mask
C) Do other VMs have a manual CPUID mask set? Are the masks different than this one VM?
D) If you vMotion a VM with a mask that has (for example) SSE4 enabled onto a VMhost that doesn't support SSE4, you can definately get a BSOD. the EVC baseline will prevent this from happening.
Ben
Which BSOD do you get?
Does the BSOD only occur with vMotion or also when you power on the VM on the 4.1 host?
Has the VM been created as a virtual machine or has it originally been P2V'd? In case of a P2V'd VM, there might still be some old hardware related drivers which may cause the BSOD.
Does the VM have any individual CPU mask settings? Maybe it helps to reset them.
André
Thank you for the quick reply Andre! Come Monday I will have answers to some of your questions.
Thanks!
Which BSOD do you get?
- I am not sure because I have to wait for an outage window to reproduce it again.
Does the BSOD only occur with vMotion or also when you power on the VM on the 4.1 host?
-It occurs a few minutes after the guest has been vmotioned onto the 4.1 host. It stays up for a few minutes and then blue screens. I was not able to print screen the blue screen fast enough. When I would power cycle the guest, it would get stuck in a reboot cycle.
Has the VM been created as a virtual machine or has it originally been P2V'd? In case of a P2V'd VM, there might still be some old hardware related drivers which may cause the BSOD.
-Upon looking into the guest some more, it does appear it was a P2V. Looking at hidden devices I do see a couple. One that worries me primarily are cpu's from the old system. I do not see old NIC's or any other's that are throwing me off. Suggestions or ones to look for perhaps?
Does the VM have any individual CPU mask settings? Maybe it helps to reset them.
-What do you mean by this? CPU/MMU virtualization? If so, it is set to automatic.
Thanks in advance, and I am working on getting an outage to be able to play with the guest some more.
Thanks again.
Regarding Andre's comment about CPUID masks, that's a great place to investigate.
A) is your cluster running an EVC baseline?
B) If your VM has a CPUID mask defined Edit > Options > CPUID Mask
C) Do other VMs have a manual CPUID mask set? Are the masks different than this one VM?
D) If you vMotion a VM with a mask that has (for example) SSE4 enabled onto a VMhost that doesn't support SSE4, you can definately get a BSOD. the EVC baseline will prevent this from happening.
Ben
Thank you for jumping in as well Ben.
A) is your cluster running an EVC baseline?
-No EVC is being used in the cluster.
B) If your VM has a CPUID mask defined Edit > Options > CPUID Mask
-I guess I can't see if it does because the vm has to be off?\
I could answer C based off of B and then D no EVC..
Thanks Ben!
Without power off you can look at the .vmx file of 2 VMs and compare. Or, you can do this via powercil:
connect-viserver YourvCenterServerName
(get-vm VM1 | get-view).Config.CpuFeatureMask
(get-vm VM2 | get-view).Config.CpuFeatureMask
and compare the difference (if any).
Ben
Ben - Here is a screen shot of two boxes compared. The second one is the one I am having issues with. I am not sure how to read the output, but it would seem it has some sort of Mask set on it? Take a look when you have a chance, and thanks! I piped out the results to notepad as well as it was truncating the results, and the results were the same in the notepad, heh..
Try that powershell statement again, you should not need to select any specific output:
Ah, you know what my problem was, I was trying to run the command on an XP machine and it was not returning and output. I ran it on the box itself in question and it came back with output. I had researched another way of getting the data in the first time getting information, sorry! What do you think with what I have below? Thanks Ben!
Rather, if it doesnt return any output, does that mean nothing is set for the mask perhaps?
Output in previous post looks valid, definately something set there. If you get no output that means no mask.
Ben
Ok thank you Ben. I am not familiar with the mask and or setting the cpu mask. I have my window to work with the guest tonight. I will play around with it and see what I can come up with. Is the suggested setting to set it to "Expose the NX/XD flag to guest". That is what majority of my vm's are set at.
Thank you for all of your help!
Yes, try to match exactly the masks on the VMs that work well. Maybe you can look at the current mask in one of your templates, or the .VMX and match line for line.
Ben
Thanks a lot Ben! I will hammer away tonight.
Much appreciated!
After working on the guest machine last night I have it working. Here is everything that I did to it.
cloned source guest to not impact and have a rollback
everything I did to clone:
removed serial port
removed time sync with esx host - the guest resides in a different domain which is in a different time zone.
remove hidden devices - cpu's from P2V
removed vmware converter agent
set cpu mask to resemble another server that is alike - 2 cpu
removed uneeded software from physical server aspect
installed vmware tools onced vmotioned and stable - waited 15 minutes before doing this.
change from bus logic to lsi logic parallel.
It has been up and running for 12+ hours now!
Thanks for all the help!