These types of machine check exception errors (MCE) often indicates a hardware failure and so you should check with Supermicro on diagnosing that. That said, you are behind in patches for 6.5 and so you should plan to install P2 which came out a couple weeks ago then observe behavior.
is your RAM ecc and do you have the setting for ECC on in the BIOS? Name brand ram? ram purchased in a set?
Any goofy PCIe cards?
Running MEMTEST. Will post screenshot when it's done.
Yes RAM is ECC KVR21R15D4/16 https://www.kingston.com/datasheets/KVR21R15D4_16.pdf
No RAM purchased as 4 single DIMMs.
Stramge thing also that you can boot server he will show no problems during boot.
Another time he tell that problematic DIMM in slot 1. Another time in slot 2. Error inconsistent.
Only PCIe in server is Raid LSI 9260-4i but hard drive system looks good.
There was Additional 4 port intel i350 rj-45 pci card but i removed it.
No specific ECC settings in BIOS.
I cross flashed a Dell Perc H200 with retail firmware and tried to use it in a HP Z800 workstation with 1 CPU and had all kinds of weird memory errors in Windows 7 x64.
So if you can remove all cards and your memtest finishes but fails with your cards in that one box it could be a bad card.
Notice the speed of your memtest and see if the other machines like it are the same speed.
Could be 1 stick of ram is incompatible. are all the RAM the same part number and voltage?
sounds like you have some flaky hardware
memtest still running 6 hours 33 min.
60 our of 64 Gb tested and no errors so far.
Yes all DIMMs are same part number and voltage.
Look at the screenshot below
You got me interest. Before i used this LSI 9260-4i i tried to use Perc h310 on this server.
No matter what i did ESXi was unable to detect any volumes even with custom ISO or with DELL esxi iso.
can you run memtest on another box that is identical, 6.5 hours for 64gb is a long time for a DDR3-4 system. There might be a different version of Memtest that will report your chip-set correctly.
This one is the latest.
I will not be able to test on another machine until 5th of January.
It/s holidays here from 1st to 9th.
I need to make it work somehow until 10th. =/
This is the latest version of memtest.
You can suggest any other Memtest86+ - Advanced Memory Diagnostic Tool
I can run it via IPMI with bootable ISO.
There are 3 versions of Memtest
2) the one you have(v5)
3) another one I think v4
all detect hardware differently.
Anyway I would take 1/2 your ram out and test with just 1/2 and see what the speed is and time, then test the other 1/2, then test with like you are doing with all the ram.
6.5 hours for a full pass, or 1.25 passes is too long
I too am doing some shut down work. I have issues with VDP so I was hoping someone here would see that and offer some suggestions. LOL
I can feel your pain bro. LOL
Is this hardware on the HCI?
Seems like hardware.
But it is also possible that i am dumbass.
Exactly the same installed ESXi on old HP Gen8 works perfectly.
I hate supermicro servers so much.
Or you mean memtest HCI?