Unable power on VM - ESXi 5.5.0

ManoelH · ‎01-26-2018

I have a RHEL 6.9 in a virtual machine, after I ran the "yum update" command, I can not connect to VM anymore.

In the BIOS it shows that it did not find the disk. Follow the attached log.

Can you help me?

admin · ‎01-26-2018

can you please upload VMX file and screen shot of vm folder ?

ManoelH · ‎01-26-2018

ranchuab I updated the first post with the information requested.

Thanks for help.

daphnissov · ‎01-26-2018

Did you take a snapshot before yum updates? If so, you might want to revert.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

ManoelH · ‎01-26-2018

daphnissov I did not take the snapshot before.

daphnissov · ‎01-26-2018

If you didn't then it's a guest-related issue at this point and not anything to do with ESXi. The VM is powered on it just won't boot.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

ManoelH · ‎01-26-2018

I updated the first post with more screenshots.

daphnissov · ‎01-26-2018

Yeah there's nothing wrong there. If your VM was working fine one minute and then you performed a yum update and now it won't boot, it's a problem with packages/kernel that got upgraded, not the VM's configuration that changed.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

admin · ‎01-26-2018

Have you made any change on vm configuration as i can see in vmware log . did you remove the disk and re add ?

see time as per vmware log

2018-01-26T11:32:48.959Z\| vmx\| I120: DICT	scsi0:0.fileName = VM-Linux-yspp0097-LDAP.vmdk
2018-01-26T11:32:48.960Z\| vmx\| I120: DICT	scsi1:0.fileName = /vmfs/volumes/52673d7e-c347a1c2-1b79-2c59e53cc874/VM-Linux-yspp0097-LDAP-PROD/VM-Linux-yspp0097-LDAP-PROD.vmdk

changes made here

2018-01-26T11:42:40.491Z\| vmx\| I120: DICT	scsi0:0.fileName = VM-Linux-yspp0097-LDAP-PROD_2.vmdk
2018-01-26T11:42:40.491Z\| vmx\| I120: DICT	scsi1:0.fileName = VM-Linux-yspp0097-LDAP-PROD_1.vmdk

current VMX file

scsi1:0.fileName = VM-Linux-yspp0097-LDAP-PROD_1.vmdk

scsi0:0.fileName = "VM-Linux-yspp0097-LDAP-PROD_2.vmdk

VM-Linux-yspp0097-LDAP.vmdk is missing here .

ManoelH · ‎01-26-2018

ranchuab It does not do in the VM before the update "yum". After a system update as a machine was no longer rebooting, I did the vMotion for another datastore and host.

daphnissov It is not an operating system problem, I have already opened a support ticket on RedHat in which I have been confirmed that the VM does not load the disk, so I can not enter Rescue mode to try something.

bluefirestorm · ‎01-26-2018

There is nothing with the virtual BIOS settings for the disk. The VM settings are using SCSI disks and SCSI disks are not visible in Primary/Slave disks options in BIOS (even in physical PC/servers).

From the looks of the log, it looks like the VM is powering up and somehow the guest OS is hung.

If the yum update included patches for Spectre/Meltdown, try changing the VM hardware compatibility from 8 to 10. Version 10 is the maximum that ESXi 5.5 can support.

https://kb.vmware.com/s/article/2007240

More specifically, if the yum update include a Meltdown patch, it might be looking for the PCID feature which is not exposed to the VM in version 8 compatibility. The CPU that you have is Sandy Bridge has the PCID feature but does not have the INVPCID instruction. There is nothing we can do about that if the Meltdown patch requires the INVPCID instruction unless you have a server with Haswell CPU without EVC masking.

ManoelH · ‎01-27-2018

bluefirestorm That is, I lost my virtual machine?

I've already changed the VM version from 8 to 10 but it's still the same.

daphnissov · ‎01-27-2018

Maybe somehow the VMDKs have become unordered. In your VM configuration, try swapping the SCSI ids for the _1 and _2 VMDK files so that scsi0:0 is assigned to _1 and scsi1:0 is assigned to _2.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

ManoelH · ‎01-27-2018

daphnissov I made the change, but it did not work.

Any more tips before telling the company that I lost a machine from the production environment?

daphnissov · ‎01-27-2018

Can you show the VMX file at this point? Or just take a screenshot under edit settings with the hard drives expanded.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

ManoelH · ‎01-27-2018

daphnissov Is that what you wanted to see?

daphnissov · ‎01-27-2018

Yes, ok. Just to reduce complexity here, remove the the ISO from your CD-ROM's configuration. It's a VMware tools ISO that's still mounted and shouldn't be.

I want to understand the course of events here. Exactly what steps were taken between when this VM was booting and when it wasn't? Was it only a yum update and nothing more? You didn't reconfigure the VM? You did nothing else? Please be as specific as you can while I look through your log files.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

ManoelH · ‎01-27-2018

daphnissov The CD-ROM is not mounted or connected.

I just did the OS update procedure by running the yum update command and nothing more, after this procedure that the mentioned fault occurred.

As the machine was slow to start, I entered her console and it was all black.

daphnissov · ‎01-27-2018

What's confusing here is that you seem to only have one VMDK in your VM's configuration, then you power it down and have 2 VMDKs and neither of which, according to the file name, corresponds to the first. So on what date and time did the VM first stop booting correctly?

As late as this time stamp (found in vmware-55.log), you only appear to have one VMDK.

2018-01-26T11:32:48.959Z| vmx| I120: DICT scsi0:0.fileName = VM-Linux-yspp0097-LDAP.vmdk

But thereafter, you now have two (vmware-56.log and vmware.log)

2018-01-26T11:42:40.491Z| vmx| I120: DICT        scsi0:0.deviceType = scsi-hardDisk
2018-01-26T11:42:40.491Z| vmx| I120: DICT          scsi0:0.fileName = VM-Linux-yspp0097-LDAP-PROD_2.vmdk
2018-01-26T11:42:40.491Z| vmx| I120: DICT      sched.scsi0:0.shares = normal
2018-01-26T11:42:40.491Z| vmx| I120: DICT sched.scsi0:0.throughputCap = off
2018-01-26T11:42:40.491Z| vmx| I120: DICT           scsi0:0.present = TRUE
2018-01-26T11:42:40.491Z| vmx| I120: DICT      ethernet0.virtualDev = e1000
2018-01-26T11:42:40.491Z| vmx| I120: DICT     ethernet0.networkName = LAN
2018-01-26T11:42:40.491Z| vmx| I120: DICT     ethernet0.addressType = generated
2018-01-26T11:42:40.491Z| vmx| I120: DICT ethernet0.generatedAddress = 00:0c:29:2e:94:b5
2018-01-26T11:42:40.491Z| vmx| I120: DICT         ethernet0.present = TRUE
2018-01-26T11:42:40.491Z| vmx| I120: DICT        scsi1:0.deviceType = scsi-hardDisk
2018-01-26T11:42:40.491Z| vmx| I120: DICT          scsi1:0.fileName = VM-Linux-yspp0097-LDAP-PROD_1.vmdk
2018-01-26T11:42:40.491Z| vmx| I120: DICT      sched.scsi1:0.shares = normal
2018-01-26T11:42:40.491Z| vmx| I120: DICT           scsi1:0.present = TRUE

So if this VM worked fine up until it was booted at 1/26 11:42, then someone changed the VM's configuration. I want to understand the chain of events leading up to this, because a yum update inside the guest OS has no power to make such a configuration change.

------------------
How to Ask for Help on Tech Forums
https://neonmirrors.net

bluefirestorm · ‎01-27-2018

It is a bit strange to see CPUID masks in the vmx configuration file (especially those with .amd).

hostCPUID.0 = "0000000d756e65476c65746e49656e69"

hostCPUID.1 = "000206d70020080017bee3ffbfebfbff"

hostCPUID.80000001 = "0000000000000000000000012c100800"

guestCPUID.0 = "0000000d756e65476c65746e49656e69"

guestCPUID.1 = "000206d200020800969822031fabfbff"

guestCPUID.80000001 = "00000000000000000000000128100800"

userCPUID.0 = "0000000d756e65476c65746e49656e69"

userCPUID.1 = "000206d700200800169822031fabfbff"

userCPUID.80000001 = "00000000000000000000000128100800"

cpuid.80000001.eax.amd = "--------------------------------"

cpuid.80000001.ebx.amd = "--------------------------------"

cpuid.80000001.ecx.amd = "--------------------------------"

cpuid.80000001.edx.amd = "-----------H--------------------"

cpuid.80000001.eax = "--------------------------------"

cpuid.80000001.ebx = "--------------------------------"

cpuid.80000001.ecx = "--------------------------------"

cpuid.80000001.edx = "-----------H--------------------"

The guestCPUID.1 and userCPUID.1 looks like are masking out some capabilities from the guest OS including PCID.

The easiest would be to just put a # in front of guestCPUID.1 and userCPUID.1 and try to power up. Sorry I don't have the time and patience to examine bit-by-bit to detail the differences in CPU features but looks like PCID capability is masked out.

hostCPUID.1. ecx = 17bee3ff

vs

guestCPUID.1 ecx = 96982203 = 1001:0110:1001:1000:0010:0010:0000:0011

The hex 8 above is bits 19 - 16: = 1000

Bit 17 ecx = 0 means PCID is masked out

All

Unable power on VM - ESXi 5.5.0