VMware Cloud Community
wililupy
Enthusiast
Enthusiast

Error during upgrade from ESX 3.x to 4

Hello,

I was on the phone with VMWare for 2 hours about this problem and no one can seem to figure this out.

We want to upgrade our existing ESX clusters from 3.0.2 to 4. In the test environment, we had no problems, however, when we tried to do this on production, we have hit a huge snag.

We have already upgraded Virtual Center from 2.0.2 to 4 with no real hiccups. It took a little longer becuase of some manual changes we had to make to the database, but overall it was pretty straight forward.

Today, we tried to upgrade on of the ESX servers. So I evacuated all the running VM's on the server and manually placed it in Maintenance mode. I ran the Host Update Utility instead of the Update Manager becuase I didn't want to upload the ESX DVD ISO across our WAN. (The Virtual Center Server is in another City, with 5 esx servers there, 3 in my site and 2 in another city's site.) I started the Host Update Utility, and everything seemed to be going well until I got an error saying that the upgrade failed. Looking at the console on the ESX server I got an error saying the following:

Driver's Successfully Loaded

Cannot find device with UUID: xxxxxxxxxxx

Error: Cannot find device with UUID: xxxxxxxxxx

Press <return> to reboot

When I reboot the server, it starts the upgrade again, and fails at the same point. I managed to get the server to start back up in ESX 3.0.2 successfully, and I found out what the UUID was for. It goes to the /boot partition. I looked in there and looked in the /boot/grub/grub.conf and everything is matches up fine.

I then modified the grub.conf so that it booted up ESX Server instead of ESX Upgrade and contacted VMWare support. We tried multiple things including an upgrade script that you run directly from the ESX Server, but all it does is what the Host Update Utility does.

The technician then recommended that I upgrade to ESX 3.5 and try it again, so I did that and I still get the same error.

If I remove the bootpart=xxxxxxxxx and reboot, it then says it can't find the UUID xxxxxxxx which is my root partition. I haven't tried removing both lines, but I assume if I do that, then the upgrade will have no idea what to upgrade.

We can't do a clean install on these servers becuase of the configurations on the ESX Servers for some of the VM's. We have a VM Cluster that is tied specifically to two of my ESX servers and communicate with a SAN directly, not through the ESX Server (we had to do this for the company to support the software, not my call. I would have done it right.) If it wasn't for that one restriction, I would have no problem rebuilding the entire ESX server and installing all my SAN software and drivers so that it can communicate with my LUN's I have carved out for it.

My hardware for my ESX Servers is the same across all the sites. They are Dell PowerEdge 2950 with 2 Quad Core Xeon Processors and 16GB of ram with 6 72GB SAS hard drives attached at the PERC 5/i controller. I have two sets of mirrors and two hot spares in the servers. They are connected to my Dell CX-310c SAN via the qlogic FC card.

I am at a loss right now. Does anyone have any ideas?

0 Kudos
35 Replies
wililupy
Enthusiast
Enthusiast

Just a quick verification, but you don't have a SAN connected to the HBA do you? One thing I ran into was having the SAN plugged in. For some reason, it kept wanting to use the MSA 1000 as primary storage and not seeing the built in storage on my PERC controller. I disconnected the MSA and had no issues.

Another thing you can check is to make sure that you don't have custom partitions. If you installed ESX 3.5 with the typical settings and did not make changes to the partition tables during setup, you should be fine.

0 Kudos
coreymayo
Contributor
Contributor

No, the host is not connected to a SAN. I do have a second partition mounted from /etc/fstab so I can try removing that.

On a side note, updating the RAID card's firmware doesn't seem to have changed anything for me.

0 Kudos
wililupy
Enthusiast
Enthusiast

Upgrading my firmware did not help either. The only way I got it to finally work was to disconnect my HP MSA 1000 from the HBA and boot up that way.

It seemed like the MSA was trying to become the boot device during boot up, even though I had the disable boot device in the HBA's BIOS, and on my servers BIOS I was telling it to use the PERC controller. I also verified that the PERC controller BIOS was enabled to boot up the server.

VMWare told me that the major upgrade between version 3.x and 4 is the way that the system handles data stores. It was so that you can use vmotion storage, which is one of the mail reasons we went to the upgrade.

You shouldn't have to remove the second partition from the /etc/fstab, just try commenting it out.

Also, how are you performing your upgrade? Are you using the Host Update Utility or the shell script from VMware KB or Update Manager from vSphere Client?

0 Kudos
coreymayo
Contributor
Contributor

Commenting out the second partition in /etc/fstab did nothing. I have tried using both the Host Update Utility and the Update Manager in the vSphere client with no luck. I'll give the script a try.

0 Kudos
coreymayo
Contributor
Contributor

Finally got ESX4 installed! I was able to install from scratch using the DVD when I used the "noapic" kernel argument at boot. It is the same as this http://communities.vmware.com/thread/154852 , supermicro motherboard with the same adaptec raid controller.

Thanks for the help!

0 Kudos
undrwatr
Contributor
Contributor

Can you send me the install command you used for the install so I can try it?

0 Kudos
coreymayo
Contributor
Contributor

Sure, this worked when doing a fresh install from the DVD. When the disk first boots, press F2 for other options, then append "noapic" to the end of the boot arguments. I suppose I could have instead added that to grub.conf to get the upgrade to work.

0 Kudos
wililupy
Enthusiast
Enthusiast

What was your hardware is your host running? If its similar to mine, you may have found a bug fix for this issue that would make it easier to do this instead of unplugging my san and plugging it back in after ESX is booted up.

I'll try running the noapic boot parameter and leaving my SAN plugged in to see if that works.

Funny that you had to disable advanced programmable interrupt controllers to get it to work. Seems like a bug fix that needs to be addressed...

Something tells me that it will.

Good job!

0 Kudos
undrwatr
Contributor
Contributor

I figured out my problem was related to how the VMFS share was defined. Someone had defined it as esx01:storage1, for some reason that colon seemed to mess up the whole install process. Once I fixed that I was to complete the install and bypass the error.

0 Kudos
CrazyTao
Contributor
Contributor

Failed again!

Anybody have resolv this problem?

Thanks~~

0 Kudos
CrazyTao
Contributor
Contributor

Failed again!

Anyone has resolv this problem?

Thanks~~

7529_7529.jpg

root@xac root# vmkfstools -Ph /vmfs/volumes/xacstorage1

VMFS-3.31 file system spanning 1 partitions.

File system label (if any): xacstorage1

Mode: public

Capacity 141G, 139G available, file block size 1.0M

Partitions spanned (on "lvm"):

vmhba32:0:0:3

Open /etc/vmware/ks-upgrade.cfg, change "xacstorage1" to UUID.

7530_7530.jpg

7531_7531.jpg

7532_7532.jpg

Load drivers.

7533_7533.jpg

Some error happens........

7534_7534.jpg

0 Kudos
Sreejesh_D
Virtuoso
Virtuoso

Can you try after removing the "/" which is appended after the datastore name? ie, "/vmfs/volumes/xacstorage1"

:+: VCP,RHCE,EMCPA.

0 Kudos
CrazyTao
Contributor
Contributor

When I remove the "/",It works fine!

7535_7535.jpg

Beacuse I use "TAB", then Pop-up "/" .

Thanks yezdi for giving me a great help again~~

0 Kudos
wksantiago
Contributor
Contributor

That worked for me too... I also had esx03:storage1. I replaced the colon with a space and wala it worked.

This initial installation was done by Dell services.

Thanks undrwatr,

0 Kudos
wksantiago
Contributor
Contributor

Well I tried another ESX and it worked perfect too. Then I went for the last ESX and the same problem. Re-booted and attempted the upgrade several times after even changing the name once again. After several other re-boots and attempts it finally worked.

Very strange. I guess need to keep trying and trying until it goes thru.

0 Kudos
dhazar
Contributor
Contributor

Here is the solution I found to a similar problem in my environment:

http://davidhazar.blogspot.com/2010/10/curse-of-grubupdate-upgrading-from_13.html

The error occurred during the grub update process which is the first part of the install.

Hope this helps someone,

David Hazar

http://www.linkedin.com/in/DavidHazar

http://davidhazar.blogspot.com

0 Kudos