Scepter
Contributor
Contributor

My ESXI is down.

Jump to solution

As shown in the figure, it prompts the following.

1.jpg

Please all the experts and VMware official technicians to help me.

My client is already very angry.They wanted to kill me and throw me into the hot pot.

0 Kudos
1 Solution

Accepted Solutions
ThompsG
Virtuoso
Virtuoso

Hi Scepter,

As people have mentioned this "can" be caused by faulty hardware but it can also be faulty software. The ESXi build version you are running is GA, i.e. General Availability and therefore is quite old. I would suggest that you schedule some downtime (assuming you have some uptime Smiley Wink) and apply the latest patches. There are a number of VMware KB articles related to this message and one of the newer patches may resolve this issue as based on the PSOD it appears ESXi was migrating the vCPU to another pCPU when the exception happened or was experienced.

Anywho make sure you have a good recovery path especially as you appear to have customers relying on the hosting layer. I'm hoping that you can at least vMotion the workloads to another server as you patch this one?

Also make sure that the firmware/BIOS is up to date on the hardware as well. There are a number of instances where this can be resolved with BIOS/Firmware upgrades as well. Same advise as above - make sure you have good reliable backups or a known recovery point.

Kind regards.

View solution in original post

0 Kudos
9 Replies
iiliev
VMware Employee
VMware Employee

The attached screenshot shows a page fault error (Exception 14) that happens when a memory page is requested but hasn't been loaded successfully. This could be due to either a hardware or a software issue (faulty hardware, problematic device driver, etc.).

I'd suggest to open an official support request so the generated coredump file can be properly analyzed.

0 Kudos
anurag2189
Enthusiast
Enthusiast

Do as suggested by iiliev

PSOD occurs due to hardware/software driver issue

Follow VMware Knowledge Base  to retrieve the core dump, After reboot you should be able to login to the esxi shell and get the dump.

Its also good to capture SOL logs during system reboot to know if there is any bad drive or hardware issue.

vmkernel will be helpful to identify which driver/device caused the event.

0 Kudos
serveradminist2
Contributor
Contributor

which hardware you using,

0 Kudos
Scepter
Contributor
Contributor

Intel 1U server, motherboard is S2600GZ

0 Kudos
serveradminist2
Contributor
Contributor

i mean to say which hardware you using, that is  hp of dell.

0 Kudos
IT_pilot
Expert
Expert

Perhaps it will be faster to connect another USB drive and try installing ESXi on it (different versions 6.5, 6.7). If the installation of all versions is not successful, then the problem is in the hardware. If any version is installed, then it is possible to HCL (Hardware Compatibility List). Sometimes reinstalling ESXi and setting it up is faster than looking for an error.

http://it-pilot.ru
0 Kudos
serveradminist2
Contributor
Contributor

IF You have HP Server then need to featch:- ILO LOg.

&

IF you have dell hardware you need to featch: Dell DSET Report.

I thing memory creating probelm, please change same accourding to log only.

0 Kudos
ThompsG
Virtuoso
Virtuoso

Hi Scepter,

As people have mentioned this "can" be caused by faulty hardware but it can also be faulty software. The ESXi build version you are running is GA, i.e. General Availability and therefore is quite old. I would suggest that you schedule some downtime (assuming you have some uptime Smiley Wink) and apply the latest patches. There are a number of VMware KB articles related to this message and one of the newer patches may resolve this issue as based on the PSOD it appears ESXi was migrating the vCPU to another pCPU when the exception happened or was experienced.

Anywho make sure you have a good recovery path especially as you appear to have customers relying on the hosting layer. I'm hoping that you can at least vMotion the workloads to another server as you patch this one?

Also make sure that the firmware/BIOS is up to date on the hardware as well. There are a number of instances where this can be resolved with BIOS/Firmware upgrades as well. Same advise as above - make sure you have good reliable backups or a known recovery point.

Kind regards.

View solution in original post

0 Kudos
Arthos
Enthusiast
Enthusiast

Hi Scepter,

This is a 6.7 GA ESX version. For immediate relief , I would request to reboot the server , collect vm-support and contact VMware support team. This issue looks in the cpu sched area which might have been fixed by this time. if your employer wants to solve the problem quickly , update to the latest version of esx .

Please upvote if answer deems fit.

0 Kudos