VMware Cloud Community
ManoelH
Contributor
Contributor

Unable power on VM - ESXi 5.5.0

I have a RHEL 6.9 in a virtual machine, after I ran the "yum update" command, I can not connect to VM anymore.

In the BIOS it shows that it did not find the disk. Follow the attached log.

vmware.pngbrowser.png

hd1.pnghd2.png

Can you help me?

Tags (1)
39 Replies
admin
Immortal
Immortal

can you please upload VMX file and screen shot of vm folder ?

0 Kudos
ManoelH
Contributor
Contributor

ranchuab I updated the first post with the information requested.

Thanks for help.

0 Kudos
daphnissov
Immortal
Immortal

Did you take a snapshot before yum updates? If so, you might want to revert.

0 Kudos
ManoelH
Contributor
Contributor

daphnissov I did not take the snapshot before.

0 Kudos
daphnissov
Immortal
Immortal

If you didn't then it's a guest-related issue at this point and not anything to do with ESXi. The VM is powered on it just won't boot.

0 Kudos
ManoelH
Contributor
Contributor

I updated the first post with more screenshots.

0 Kudos
daphnissov
Immortal
Immortal

Yeah there's nothing wrong there. If your VM was working fine one minute and then you performed a yum update and now it won't boot, it's a problem with packages/kernel that got upgraded, not the VM's configuration that changed.

0 Kudos
admin
Immortal
Immortal

Have you made any change on vm configuration as i can see in vmware log . did you remove the disk and re add ?

see time as per vmware log

2018-01-26T11:32:48.959Z| vmx| I120: DICT      scsi0:0.fileName = VM-Linux-yspp0097-LDAP.vmdk
2018-01-26T11:32:48.960Z| vmx| I120: DICT      scsi1:0.fileName = /vmfs/volumes/52673d7e-c347a1c2-1b79-2c59e53cc874/VM-Linux-yspp0097-LDAP-PROD/VM-Linux-yspp0097-LDAP-PROD.vmdk

changes made here

2018-01-26T11:42:40.491Z| vmx| I120: DICT      scsi0:0.fileName = VM-Linux-yspp0097-LDAP-PROD_2.vmdk
2018-01-26T11:42:40.491Z| vmx| I120: DICT      scsi1:0.fileName = VM-Linux-yspp0097-LDAP-PROD_1.vmdk

current VMX file

scsi1:0.fileName = VM-Linux-yspp0097-LDAP-PROD_1.vmdk

scsi0:0.fileName = "VM-Linux-yspp0097-LDAP-PROD_2.vmdk

VM-Linux-yspp0097-LDAP.vmdk is missing here .

0 Kudos
ManoelH
Contributor
Contributor

ranchuab It does not do in the VM before the update "yum". After a system update as a machine was no longer rebooting, I did the vMotion for another datastore and host.

daphnissov​ It is not an operating system problem, I have already opened a support ticket on RedHat in which I have been confirmed that the VM does not load the disk, so I can not enter Rescue mode to try something.

0 Kudos
bluefirestorm
Champion
Champion

There is nothing with the virtual BIOS settings for the disk. The VM settings are using SCSI disks and SCSI disks are not visible in Primary/Slave disks options in BIOS (even in physical PC/servers).

From the looks of the log, it looks like the VM is powering up and somehow the guest OS is hung.

If the yum update included patches for Spectre/Meltdown, try changing the VM hardware compatibility from 8 to 10. Version 10 is the maximum that ESXi 5.5 can support.

https://kb.vmware.com/s/article/2007240

More specifically, if the yum update include a Meltdown patch, it might be looking for the PCID feature which is not exposed to the VM in version 8 compatibility. The CPU that you have is Sandy Bridge has the PCID feature but does not have the INVPCID instruction. There is nothing we can do about that if the Meltdown patch requires the INVPCID instruction unless you have a server with Haswell CPU without EVC masking.

0 Kudos
ManoelH
Contributor
Contributor

bluefirestorm​  That is, I lost my virtual machine?

I've already changed the VM version from 8 to 10 but it's still the same.

0 Kudos
daphnissov
Immortal
Immortal

Maybe somehow the VMDKs have become unordered. In your VM configuration, try swapping the SCSI ids for the _1 and _2 VMDK files so that scsi0:0 is assigned to _1 and scsi1:0 is assigned to _2.

0 Kudos
ManoelH
Contributor
Contributor

daphnissov I made the change, but it did not work.

so.png

Any more tips before telling the company that I lost a machine from the production environment?

0 Kudos
daphnissov
Immortal
Immortal

Can you show the VMX file at this point? Or just take a screenshot under edit settings with the hard drives expanded.

0 Kudos
ManoelH
Contributor
Contributor

daphnissov​ Is that what you wanted to see?

disk1.pngdisk2.pngall.png

0 Kudos
daphnissov
Immortal
Immortal

Yes, ok. Just to reduce complexity here, remove the the ISO from your CD-ROM's configuration. It's a VMware tools ISO that's still mounted and shouldn't be.

I want to understand the course of events here. Exactly what steps were taken between when this VM was booting and when it wasn't? Was it only a yum update and nothing more? You didn't reconfigure the VM? You did nothing else? Please be as specific as you can while I look through your log files.

0 Kudos
ManoelH
Contributor
Contributor

daphnissov​ The CD-ROM is not mounted or connected.

cdrom.png

I just did the OS update procedure by running the yum update command and nothing more, after this procedure that the mentioned fault occurred.

As the machine was slow to start, I entered her console and it was all black.

0 Kudos
daphnissov
Immortal
Immortal

What's confusing here is that you seem to only have one VMDK in your VM's configuration, then you power it down and have 2 VMDKs and neither of which, according to the file name, corresponds to the first. So on what date and time did the VM first stop booting correctly?

As late as this time stamp (found in vmware-55.log), you only appear to have one VMDK.

2018-01-26T11:32:48.959Z| vmx| I120: DICT          scsi0:0.fileName = VM-Linux-yspp0097-LDAP.vmdk

But thereafter, you now have two (vmware-56.log and vmware.log)

2018-01-26T11:42:40.491Z| vmx| I120: DICT        scsi0:0.deviceType = scsi-hardDisk

2018-01-26T11:42:40.491Z| vmx| I120: DICT          scsi0:0.fileName = VM-Linux-yspp0097-LDAP-PROD_2.vmdk

2018-01-26T11:42:40.491Z| vmx| I120: DICT      sched.scsi0:0.shares = normal

2018-01-26T11:42:40.491Z| vmx| I120: DICT sched.scsi0:0.throughputCap = off

2018-01-26T11:42:40.491Z| vmx| I120: DICT           scsi0:0.present = TRUE

2018-01-26T11:42:40.491Z| vmx| I120: DICT      ethernet0.virtualDev = e1000

2018-01-26T11:42:40.491Z| vmx| I120: DICT     ethernet0.networkName = LAN

2018-01-26T11:42:40.491Z| vmx| I120: DICT     ethernet0.addressType = generated

2018-01-26T11:42:40.491Z| vmx| I120: DICT ethernet0.generatedAddress = 00:0c:29:2e:94:b5

2018-01-26T11:42:40.491Z| vmx| I120: DICT         ethernet0.present = TRUE

2018-01-26T11:42:40.491Z| vmx| I120: DICT        scsi1:0.deviceType = scsi-hardDisk

2018-01-26T11:42:40.491Z| vmx| I120: DICT          scsi1:0.fileName = VM-Linux-yspp0097-LDAP-PROD_1.vmdk

2018-01-26T11:42:40.491Z| vmx| I120: DICT      sched.scsi1:0.shares = normal

2018-01-26T11:42:40.491Z| vmx| I120: DICT           scsi1:0.present = TRUE

So if this VM worked fine up until it was booted at 1/26 11:42, then someone changed the VM's configuration. I want to understand the chain of events leading up to this, because a yum update inside the guest OS has no power to make such a configuration change.

0 Kudos
bluefirestorm
Champion
Champion

It is a bit strange to see CPUID masks in the vmx configuration file (especially those with .amd).

hostCPUID.0 = "0000000d756e65476c65746e49656e69"

hostCPUID.1 = "000206d70020080017bee3ffbfebfbff"

hostCPUID.80000001 = "0000000000000000000000012c100800"

guestCPUID.0 = "0000000d756e65476c65746e49656e69"

guestCPUID.1 = "000206d200020800969822031fabfbff"

guestCPUID.80000001 = "00000000000000000000000128100800"

userCPUID.0 = "0000000d756e65476c65746e49656e69"

userCPUID.1 = "000206d700200800169822031fabfbff"

userCPUID.80000001 = "00000000000000000000000128100800"

cpuid.80000001.eax.amd = "--------------------------------"

cpuid.80000001.ebx.amd = "--------------------------------"

cpuid.80000001.ecx.amd = "--------------------------------"

cpuid.80000001.edx.amd = "-----------H--------------------"

cpuid.80000001.eax = "--------------------------------"

cpuid.80000001.ebx = "--------------------------------"

cpuid.80000001.ecx = "--------------------------------"

cpuid.80000001.edx = "-----------H--------------------"

The guestCPUID.1 and userCPUID.1 looks like are masking out some capabilities from the guest OS including PCID.

The easiest would be to just put a # in front of guestCPUID.1 and userCPUID.1 and try to power up. Sorry I don't have the time and patience to examine bit-by-bit to detail the differences in CPU features but looks like PCID capability is masked out.

hostCPUID.1. ecx = 17bee3ff

vs

guestCPUID.1 ecx = 96982203 = 1001:0110:1001:1000:0010:0010:0000:0011

The hex 8 above is bits 19 - 16: = 1000

Bit 17 ecx = 0 means PCID is masked out

0 Kudos