VMware Cloud Community
marioxherrera
Contributor
Contributor

Intel SSD Datacenter Tool 3.0.14 causing Purple Screen of Death

New Supermicro Vmware vSAN Environment, when we perform any NVMe removal/addition we trigger a VMware purple screen issue, this is quite common to happen when you do not enable Intel VMD feature on NVMe drive technologies (because it’s like removing memory from the CPU in “hot” without having support for it). However, we know that Intel VMD technology feature has been enabled on our Supermicro server (BIOS) for all PStack/PCI-Switches and NVMe drives.

Please note that:

  • We’re running latest NVME drive VMware drivers (async) - 1.4.0.1016-1OEM.650.0.0.4598673
  • We’re running latest Intel SSD DC Tool - 3.0.14-400
  • We’re running latest NVME drive firmware (P4510 - VDV10131, P4800X - E2010435, P4600 - QDV101D1)
  • All drivers & firmwares match the version supported by latest Intel SSD DC Tool available for VMware (version above)
  • ESX 6.5 U2 GA
  • Server model: Supermicro 2029U-TN24R4T
  • 12 X Intel NVMe SSD P4510 (capacity drives) & 4 X Intel NVMe SSD P4800X (Cache drives).

During our purplescreen troubleshooting that happened every time we triggered a Hot-Plug event (for capacity drive), we encountered the following error:

2018-10-12T18:12:26.307Z cpu27:79194)@BlueScreen: #PF Exception 14 in world 79194:isdct IP 0x418037e024f5 addr 0x430ae01c50e6

PTEs:0x800000010080c023;0x800000808e2b0063;0x800000012661b063;0x0;

2018-10-12T18:12:26.307Z cpu27:79194)Code start: 0x418037400000 VMK uptime: 0:00:36:26.986

2018-10-12T18:12:26.307Z cpu27:79194)0x43932ad1b890:[0x418037e024f5]nvme_MgmtAdminCmds@(intel-nvme-vmd)#<None>+0x3d stack: 0x418037e03be0

We decided to remove the isdct being reported during fails, once done we proceeded to remove/add multiple NVMe drives and no purple screen happened. We decided to install ISDCT again and the purple screen started happening again, we then removed the ISDCT and the purple screen was not experienced again, we even proceeded by hot-plug 8 disks at the same time and no purple screen happened again.

This led us to think that Intel SSD Datacenter Tool 3.0.14 might be the root cause of the purple screens while Hot-Plug. We enabled Intel SSD DC Tool logging mode and replicated the purple screen issue, however we didn’t find more detail at this level.

Intel released a new Intel Datacenter Tool about 1 week ago, they’re addressing “Miscellaneous Bugs”, however new release 3.0.15 was released only for Windows/Linux systems and it seems that Intel will release it for VMware soon, it’s more likely that they will fix this issue in that version, however we haven’t found any official note about this problem.

As for the moment, if we remove isdct hot plug works great and purple screen does not happen, however as you know this utility is very important for accessing relevant NVMe drive information/sensor and other maintenance/reporting tasks.

Does anyone else is experiencing this issue? We're troubleshooting with Supermicro, however this does seem to be an issue related with intel SSD DC Tool/VMware environment, something that we wouldn't expect considering that isdct is just a tool. Intel NVMe drivers seem to be doing their work with no problems.

I'll appreciate your help.

Mario Herrera

0 Kudos
1 Reply
Dave_the_Wave
Hot Shot
Hot Shot

I'm a lover of Intel SSD drives, every Windows PC I've owned has one.

Are you installing the Windows tool on a Guest OS? And you're saying it's designed to manage the SSD from within a Guest OS?

0 Kudos