VMware Cloud Community
sqlguy777
Contributor
Contributor

Upgrade to ESXi 4.1.0 throws purple screen, then re-starts ESXi 4.0 - help!

Hi

My head hurts.....

I have Esxi 4.0.0  208167 installed, and downloaded the ESXi 4.1.0  348XXX upgrade.

I uploaded the zip file ( upgrade-from-esxi4.0-to-4.1-update01-348481.zip ) into a datastore on the ESXi host, and then proceded to cd to the location

cd /vmfs/volumes/Datastore0/Upgrade

then I ran the following command to do the upgrade :

esxupdate --bundle /vmfs/volumes/Datastore0/Upgrade/upgrade-from-esxi4.0-to-4.1-update01-348481.zip  update

The files were then unpacked and the update happened ( or so it seemed ).

So I rebooted the server and it caem up with the purple screen

"Two filesystems with the same UUID have been detected. Make sure you do no have two ESXi installations"

So I rebooted twice, the second time it reverted back to ESXi 4.0 ( the pre-upgrade version ).

Then I wondered if perhaps vmware had screwed up, so I removed all but the boot drive from the server and the correct version ( 4.1.0 ) booted. Great.

So I re-connected the extra disks back, and now it boots to the original ESXi - somehow its reading the other disks and booting from them.

What the hell is going on?

Does ESXi have a unix boot.ini equivelent I can modify to make sure the correct version ( the newer version ) boots correctly?

Can I modify from the unsupported admin screen on the server or do it through Putty?

Any help welcome - it seems like an easy fix, but I am *****not**** a UNIX dude.  Sadly I never learnt unix or linux and dont know where to start looking, although I can navigate around directories and use vi, so thats a start.

Any help most welcome,

Cheers

Steve.

Tags (2)
0 Kudos
9 Replies
Dave_Mishchenko
Immortal
Immortal

The UUID is intended to detect a possible error condition where you have 2 seperate ESXi installs - http://www.vm-help.com//esx41/file_system_UUID.php.

What sort of hardware , storage controller and disk setup are you using?

0 Kudos
sqlguy777
Contributor
Contributor

Its  Dell 5400 desktop with 3 internal SAS disks and no raid, so each disk is individual

0 Kudos
sqlguy777
Contributor
Contributor

Hi

OK, I tried that fix in the link to that othe rforum you suggested.

It booted the first time OK using the Shift + O during hypervision start and then typing in    overrideDuplicateImageDetection

The I rebooted it a second time and it gave me the purple screen again. Seems it doent hold that setting.

is there a way to permenantly fix this please, similar to modifying the ESXi equivelent of a windows Boot.ini?

I'd rather avoid deleting partitions if I can.

Are there tools available that will can be used to either control which partition / version of ESXi is booted from, or can I delete the previous verision of ESXi?

Any thoughts welcome.

0 Kudos
DSTAVERT
Immortal
Immortal

I would say that you have installed ESXi on two different disks. I usually change the partition types on the unwanted partition to type empty using fdisk from the command prompt. Changing to empty does not remove anything and you can revert if you need to. Use the technique that Dave pointed to at his vm-help site to get the host booted then use the Tech Support Console to change the partition types.

-- David -- VMware Communities Moderator
0 Kudos
sqlguy777
Contributor
Contributor

Hi

Seems I assumed ( possibly wrongly ) that ESXi was intelligent enough to install over an existing installation instead of now having a split personality.

OK, I've worked with windows for years, but have no idea on how to ( safely ) change a partition type in UNIX. Can you possibly point me in the right direction for how to do this safely please?

Also, I've had a look at the disk definitions in vsphere, and there are vmhbaXX values, but how do I link these to a volume etc?

If I change the patition type to empty, will it stop access to whats on the disk , which has a data store on it?

Normally I wouldnt have to go anywhere near the OS, so this is an unexpected rapid learning experience.......

Any help welcome.....

0 Kudos
DSTAVERT
Immortal
Immortal

You will need to use the Tech Support Console. Since you are using 4.1 you will need to activate it. From the Yellow Console log in and select Troubleshooting and enable Tech Support.

Press ALT + F1 to get to the Tech Support Console. Type root and you password to log in.

You will be using fdisk to change the partition types. Changing the partition type will not delete anything. If you mess up you can just change the empty partitions back to FAT. You will only change the partitions that are FAT The datastores are type FB so as long as you leave it alone you will be fine.

From the Tech Support console type fdisk -l That should show you a list of disks and their partitions.

-- David -- VMware Communities Moderator
0 Kudos
DSTAVERT
Immortal
Immortal

You should see something like this

~ # fdisk -l

Disk /dev/disks/mpx.vmhba1:C0:T1:L0: 2004 MB, 2004877312 bytes
64 heads, 32 sectors/track, 1912 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

                          Device Boot      Start         End      Blocks  Id System
/dev/disks/mpx.vmhba1:C0:T1:L0p1             5       900    917504    5  Extended
/dev/disks/mpx.vmhba1:C0:T1:L0p2           901      1912   1036288   fb  VMFS
/dev/disks/mpx.vmhba1:C0:T1:L0p4   *         1         4      4080    4  FAT16 <32M
/dev/disks/mpx.vmhba1:C0:T1:L0p5             5       254    255984    6  FAT16
/dev/disks/mpx.vmhba1:C0:T1:L0p6           255       504    255984    6  FAT16
/dev/disks/mpx.vmhba1:C0:T1:L0p7           505       614    112624   fc  VMKcore
/dev/disks/mpx.vmhba1:C0:T1:L0p8           615       900    292848    6  FAT16

Partition table entries are not in disk order

Disk /dev/disks/mpx.vmhba1:C0:T0:L0: 42.9 GB, 42949672960 bytes
64 heads, 32 sectors/track, 40960 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

                          Device Boot      Start         End      Blocks  Id System
/dev/disks/mpx.vmhba1:C0:T0:L0p1             5       900    917504    5  Extended
/dev/disks/mpx.vmhba1:C0:T0:L0p2           901      4995   4193280    6  FAT16
/dev/disks/mpx.vmhba1:C0:T0:L0p3          4996     40960  36828160   fb  VMFS
/dev/disks/mpx.vmhba1:C0:T0:L0p4   *         1         4      4080    4  FAT16 <32M
/dev/disks/mpx.vmhba1:C0:T0:L0p5             5       254    255984    6  FAT16
/dev/disks/mpx.vmhba1:C0:T0:L0p6           255       504    255984    6  FAT16
/dev/disks/mpx.vmhba1:C0:T0:L0p7           505       614    112624   fc  VMKcore
/dev/disks/mpx.vmhba1:C0:T0:L0p8           615       900    292848    6  FAT16

Post back the results of fdisk -l

-- David -- VMware Communities Moderator
0 Kudos
sqlguy777
Contributor
Contributor

Hi

In the end I used this method to fix the problem:

(1) When the Hypervision screen appeared, I pressed Shift + O and then typed    overrideDuplicateImageDetection   and then hit Enter to boot.

This allowed the server to boot into 4.1.0

(2) I moved all the VMs out of the datastore on the same disk that contained the old system I wanted to remove - I moved the VMs to another datastore on a different disk.

(3) I tried to delete the empty datastore, but it errored. I then realised I needed to enable the Remote Tech Support Console so I could get in and remove all disk partitions on the drive in question - I needed the console operational so I could use PuTTY to do a remote connection :

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=101791...

(4) Once connected in PuTTY,  I deleted all partitions on the disk and rebooted the box. This removed all old partitions used for VM storage *plus* it removed the old installation of 4.0 which is what I was trying to do originally.

The tool used was fdisk.

I identified the disk by going to COnfiguration tab in vSphere client, then choosing devices view. I found the name of the disk then found its Identifier.

The identifier is what you need to use fdisk.

The Systax I used was :

fdisk  /dev/disks/<disk_identifier>

(5) To display all the partitions on the disk I used this command:

p  ( fdisk is already running )

you will see a display of the partitions of the disk like this :

~ # fdisk /dev/disks/t10.ATA_____WDC_WD3200AAKS2D75L9A0________________________WD2DWCAV2A141188

The number of cylinders for this disk is set to 305245.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): m


Command Action

a       toggle a bootable flag
b       edit bsd disklabel
c       toggle the dos compatibility flag
d       delete a partition
l       list known partition types
n       add a new partition
o       create a new empty DOS partition table
p       print the partition table
q       quit without saving changes
s       create a new empty Sun disklabel
t       change a partition's system id
u       change display/entry units
v       verify the partition table
w       write table to disk and exit
x       extra functionality (experts only)

Command (m for help): p

Disk /dev/disks/t10.ATA_____WDC_WD3200AAKS2D75L9A0________________________WD2DWCAV2A141188: 320.0 GB, 320072933376 bytes
64 heads, 32 sectors/track, 305245 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device                                                                                                                                                                           Boot   Start  End     Blocks     Id   System
/dev/disks/t10.ATA_____WDC_WD3200AAKS2D75L9A0________________________WD2DWCAV2A141188p1             5       900    917504    5    Extended
/dev/disks/t10.ATA_____WDC_WD3200AAKS2D75L9A0________________________WD2DWCAV2A141188p4   *         1         4      4080        4    FAT16 <32M
/dev/disks/t10.ATA_____WDC_WD3200AAKS2D75L9A0________________________WD2DWCAV2A141188p5             5       254    255984     6    FAT16
/dev/disks/t10.ATA_____WDC_WD3200AAKS2D75L9A0________________________WD2DWCAV2A141188p6           255     504    255984     6    FAT16
/dev/disks/t10.ATA_____WDC_WD3200AAKS2D75L9A0________________________WD2DWCAV2A141188p7           505     614    112624     fc   VMKcore
/dev/disks/t10.ATA_____WDC_WD3200AAKS2D75L9A0________________________WD2DWCAV2A141188p8           615     900    292848      6   FAT16

Partition table entries are not in disk order

Note - the partition number that fdisk uses is the "p1" or "p4" etc  at the end of the device name 

(6) So I then used the "p" command and deleted each partition - 1,4,5,6,7,8 and this cleared the disk.

(7) The box then came up trouble free without the purple screen - I then created a new datastore & rebooted box - job done.

The key was the deleting partitions

Cheers

0 Kudos
DSTAVERT
Immortal
Immortal

Glad you are up and going.

-- David -- VMware Communities Moderator
0 Kudos