I'm wondering if ESX/ESXi 3.5 monitors the physical hardware for errors. Something similar to Solaris FMA (where it monitors each component for soft errors). Ideally, ESX monitors for soft errors. If it finds the errors have passed a threashold, it proactively offline the faulty component (e.g. a core or entire socket) and restarted affected VM (if it has to).
A related questions are:
1. If there is a core failure (in socket 0 in a Xeon box), does ESX crash? I hope not.
2. If there is a memory failure (say one of the DIMM), how does ESX handle it?
Thanks
e1
Hello,
My experience shows the following:
1. If there is a core failure (in socket 0 in a Xeon box), does ESX crash? I hope not.
Crash or hang, depending on the problem. I have had both. In some cases only the SC crashes.... but then management is impossible so you have to shutdown the host and reboot.
2. If there is a memory failure (say one of the DIMM), how does ESX handle it?
Depends on when this happens. If the memory is not yet in use, nothing, if it is in use, crash. If however the host has hardware raid memory, nothing may happen. If the memory is in use by only the SC, only the SC may crash as well.
Best regards,
Edward L. Haletky
VMware Communities User Moderator
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.
Blue Gears and SearchVMware Pro Blogs: http://www.astroarch.com/wiki/index.php/Blog_Roll
Top Virtualization Security Links: http://www.astroarch.com/wiki/index.php/Top_Virtualization_Security_Links
Thanks.
From what I know, Memory Mirroring (especially at the hardware level, transparent to OS) is available in "UNIX" (e.g. SPARC), but not X64 architecture.
cheers!
e1
That would be incorrect. There are AMD based machiens with this technology.
--Matt