jasonboche
Immortal
Immortal

Trust or rebuild the ESX host after ungraceful shutdown?

Jump to solution

Do you trust a production ESX host after it has been shut down ungracefully, such as a power outage or any other type of outage where the ESX OS was not shut down in a proper manner?

The resulting error messages that show up on the subsequent reboot make me a bit nervous to the point that I'd like to rebuild the host for my own piece of mind. File corruption, journaling errors, and a tainted OS/Kernel which now does not match the other ESX hosts in the data center bothers me to no end.

VCDX3 #34, VCDX4, VCDX5, VCAP4-DCA #14, VCAP4-DCD #35, VCAP5-DCD, VCPx4, vEXPERTx4, MCSEx3, MCSAx2, MCP, CCAx2, A+
0 Kudos
1 Solution

Accepted Solutions
oreeh
Immortal
Immortal

If it shows erros: I rebuilt immediately

If it doesn't show errors and I have enough free resources in the cluster I rebuilt it, otherwise I plan for some downtime and rebuilt is asap.

View solution in original post

0 Kudos
6 Replies
oreeh
Immortal
Immortal

If it shows erros: I rebuilt immediately

If it doesn't show errors and I have enough free resources in the cluster I rebuilt it, otherwise I plan for some downtime and rebuilt is asap.

0 Kudos
Texiwill
Leadership
Leadership

Hello,

Depending on the error, I would rebuild. What specific error?

If it is journalling errors, if you reboot the box do the errors disappear or is it consistent? You could also force a fsck to happen on boot. This way you can get a file level scan or boot in Linux mode running in single user mode and do the same. If fsck is clean, then there should not be a need to rebuild.

I have had power failures where rebooting the box is all it took and all was well. I have had one failure that corrupted the disk. I found this by the errors during reboot and a subsequent fsck that had literally 1000s of errors. I rebuilt at that time.

Other times, I see no need to rebuild the server. But it depends on the error and consistency of the errors.

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2022,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
jasonboche
Immortal
Immortal

Well, this has happened a few times in the past for various reasons, but in this latest particular instance:

fsck errors.

HP Insight Management Home Page is no longer accessible.

I always rebuild. The rebuild process is so quick and automatic, it's really a no brainer decision. I was just wondering what other people thought.

VCDX3 #34, VCDX4, VCDX5, VCAP4-DCA #14, VCAP4-DCD #35, VCAP5-DCD, VCPx4, vEXPERTx4, MCSEx3, MCSAx2, MCP, CCAx2, A+
0 Kudos
Texiwill
Leadership
Leadership

Hello,

I agree in this case. However, I always check, as rebuilding while quick, still has setup time and if I do not have to rebuild, I rather not. I am interested as well in what other people think.

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2022,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
msmenne17
Enthusiast
Enthusiast

If the rebuild is scripted and quick, I would rebuild it for peace of mind. It doesn't take long and then you KNOW everything is ok as far as the install.

If you're getting fsck errors, then something with the file system got funked up. Maybe corrected, maybe not. If they are recurring on subsequent reboots, I wouldn't personally trust it.

Michael

DFATAnt
Enthusiast
Enthusiast

Don't waste any more time. Rebuild it and you'll sleep easy at night...

0 Kudos