VMware Cloud Community
IT_Architect
Enthusiast
Enthusiast
Jump to solution

Emergency! Changed Processors, VMs won't start

Help!  I just changed processors from a 5570 to a 5670.  The VMs rand fine for awhile, but then froze.  Now I can't power them up.  The two processors are the same except for the number of cores.

Reply
0 Kudos
1 Solution

Accepted Solutions
Troy_Clavell
Immortal
Immortal
Jump to solution

you have restarted the management agents on the ESXi Host where this guest is registered?  Also, you have tried removing the guest from inventory, and then browsing the datastore, and right click on the vmx and "add to inventory"?

also, take a look at the below article

http://kb.vmware.com/kb/1022046

View solution in original post

Reply
0 Kudos
15 Replies
AWo
Immortal
Immortal
Jump to solution

Which guest OS types are affected?

Sure that the new processor is O.K.?

AWo

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =
Reply
0 Kudos
Troy_Clavell
Immortal
Immortal
Jump to solution

also, what error are you getting? How many Hosts did you make this change on?  Have you tried removing from inventory, add to inventory and then power on?

IT_Architect
Enthusiast
Enthusiast
Jump to solution

I'm not sure of anything yet. I did a electrical disconnect reset.  The server is now running and VMs mounted except the critical one.
History:
- The processor upgrade went fine.
- I shut down VM and upped the cores from 3 to 4
- I restarted it and it ran fine for an hour.

- The it stopped communicating.
- I restarted it and it didn't help.
- I had it go back to the snapshot before the processor upgrade.  I threw an error about it not being able to go there, but apparently it did.
- I went to ESXi console.  It was black.
- I did a power strip reset
- It came back and mounted the other two VMs
- The critical one has a red exclamation mark by it.
- It did not power in.  The error says: The attempted cannot be performed in the current state powered on.  If I  check the VM, it is not powered on and the only option is to power on.

- After awhile when I try it, it now says the same error as before:
"The features supported by the processor(s) in his machine are different from the featuressupported by the processor(s) in the machin on which the checkpoint was saved. Please try to resume the snapshot on a machine where the processors have the same features."  I don't have a machine like that, and the other two are running fine, but I haven't reverted the snapshots on those.
- It DID remove the shapshot according to the VM Client

Reply
0 Kudos
Troy_Clavell
Immortal
Immortal
Jump to solution

you have restarted the management agents on the ESXi Host where this guest is registered?  Also, you have tried removing the guest from inventory, and then browsing the datastore, and right click on the vmx and "add to inventory"?

also, take a look at the below article

http://kb.vmware.com/kb/1022046

Reply
0 Kudos
AWo
Immortal
Immortal
Jump to solution

Regarding the snapshot:

If you did a snapshot of the running machine the proccessor state has also been saved. That doesn't fit to the new processor anymore.

AWo

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =
IT_Architect
Enthusiast
Enthusiast
Jump to solution

I'll try the remove from inventory

Reply
0 Kudos
IT_Architect
Enthusiast
Enthusiast
Jump to solution

- I went with the recommended settings.  It came back with Unregistered virtual machine when I try to start it.
- I set it up again and set it up with the settings I had before and tried it again.
- OS started by volumes not mounting  FreebSD

Reply
0 Kudos
IT_Architect
Enthusiast
Enthusiast
Jump to solution

I'm restoring the VM from a backup on another server.  The backup contains .vmx and the two vmdks.  Any suggestions?

Reply
0 Kudos
IT_Architect
Enthusiast
Enthusiast
Jump to solution

All the VMs boot up fine.  The problem is, they don't communicate from the NICs.  Could this be because I am using 4.0 instead of 4.1, and the 4.1 has a problem with 6 core cpus?

Reply
0 Kudos
Chamon
Commander
Commander
Jump to solution

You can also create a new VM and the delete the default drive that is created. Then add the VMDK file(s) to the new VM that were used by the old non functioning VM. Then you will have a VM configured for the new processor.

Reply
0 Kudos
IT_Architect
Enthusiast
Enthusiast
Jump to solution

I explained the whole thing of what happened, but this aweful, aweful forum software lost the entire thing.  I'll have to come back to it later after I cool off.

Reply
0 Kudos
Chamon
Commander
Commander
Jump to solution

That's happened to me before as well. If I know I am going to type a long post I write it in a document then copy and paste.

Reply
0 Kudos
IT_Architect
Enthusiast
Enthusiast
Jump to solution

>That's happened to me before as well. If I know I am going to type a long post I write it in a document then copy and paste.<

And I knew how awful it is, and I often do that too.  However, I'm so used to working on all the other forums, which use the defacto standard vBulletin, that I forgot to this time.  I'm not going to spemd the time orgainizing it like I did the first time, but this is the gist of it.

1.  Two nights ago, the DC ws supposed to replace the processor.  I set the alarm, waited, and they dropped the ball.  This time I set the alarm 1/2 hour inside of the maintenance window.  When that time came, I saw they hadn't updated it so I append to the ticket.  No answer.  I made up a new ticket to point to the original ticket.  It took another hour to find the guy with the ticket, and now we were outside of the maintenance window by over an hour.   The tech told me he was outside of the window unless I wanted it down now.  I sent another ticket back to do it now.  I watched and watched and now didn't happen.  22 milnutes after “now”, I appended the ticket to scratch it since traffic was picking up.  Then the server went down.  He had the processor in and the server up within 10 minutes after that, and everything was working really well.

2.  Over the next hour and half I was very pleased with the performance of the new 6 core processor.  I was comparing graphs before an after.  I had just relaxed, and I got a text on my phone.  It was Zabbix telling me the site was down.  I thought, how can that be with loads of .65.  I went to the site, but no site.  Then I tried to connect via the CP.  No luck. Then I went the the VM Client.  The load was almost nothing. Then I checked out the other two VMs.  They seemed fine, but not the critical VM.

3.  I decided, it worked well for the past hour and a half.  I wonder what the deal is.  I had set a snapshot from before the upgrade, so I decided to go back to that.  Bad move.  I got a message about how it couldn’t do that because of the new CPU.  I no longer have the exact error message since it is now lost due to this forum.  Well, I might as well just delete it then.  That worked, but something must have happened when I told it to go to the snapshot I did because now it won’t start and had a red exclamation point next to it.  I tried a few suggestions here on the thread and got it to boot, it needed a bunch of fsck, and basically ended up being damaged to where nobody would trust it.  So I spent another 56 minutes bringing it back from the backup server.  It booted up fine then, but simply didn’t work.

4.  Next I typed up a support ticket explaining all this.  They said they would escalate the ticket.  Within a few minutes, I saw a ticket confirmation come in with a different ticket number on it.  I figured they had started a new one, and was going to ignore it, but for some reason I looked at it.  It had nothing to do with my ticket, but explained about some “Short Notice” maintenance they had to do.  Then a couple pieces snapped together in my mind that the other two VMs that were fine, I checked over the VPN.  I updated my other ticket and told them to hold off yanking the new processor until they get their network straightened out, and give them some IP addresses of VMs to ping.

5.  I launched the VCL and went to the console on the Windows machine, and logged in.  I tried to ping common servers on the internet and could not.  I pinged the other VMs.  I SSHed into them.  Then I launched a browser, and while I could not go to the internet, I could open the control panel on the critical machine over the VPN.  All of the processors were running fine.

6.  While going through all this checking, I got another text message telling me they were back on line.  Shortly thereafter I received an e-mail that said they can ping the VMs that I gave them.

Thus, I was victim of an unusual series of events that cost me a day, untold damage to a site that normally receives 6.5 million unique visitors a month, and stress that took about a year off my life.  It also makes this thread meaningless, and should be deleted because it implicates neither VMware nor hardware that works with it.  About the only lesson I can think of that can be learned is you can hurt yourself by trying to go back to a previous snapshot of your processor changed.  This processor was the same family and speed with two cores added.

I thank everyone's helpful suggestions.

Reply
0 Kudos
DSTAVERT
Immortal
Immortal
Jump to solution

Glad you are back and running. At least mark things as answered. This is one time where there should be an "Oh never mind" switch in the forums. Smiley Wink

-- David -- VMware Communities Moderator
Reply
0 Kudos
IT_Architect
Enthusiast
Enthusiast
Jump to solution

>Glad you are back and running. At least mark things as answered. This is  one time where there should be an "Oh never mind" switch in the forums.  Smiley Wink<


Thank you for reminding me.  I'm drained from the experience of rapid fire typing trying things, and now going 100 mph trying to make up for lost time, that I would have never remembered.

Reply
0 Kudos