1 2 Previous Next 19 Replies Latest reply on Nov 13, 2019 2:26 PM by lukebes1010 Go to original post
      • 15. Re: ESXi 6.7u3 PSoD Reoccurrences (Exception 13) - what could be the root cause
        Phel Novice

        The NVME is used as a VMFS Datastore, not as a cache store.

         

        I believe the default disk dump placement location is the Intel SSD. Both of these have plenty of space left on the device (at least half the space is unallocated).

         

        I will have to look at changing the disk dump location though if the space is not adaquate. Of that 1 TB for the SamsungSSD datastore, there is maybe 400-500GB still available, and that should be sufficient space I would assume for esxi to perform a data dump. The other SSD's also should have sufficient space available as well.

        • 16. Re: ESXi 6.7u3 PSoD Reoccurrences (Exception 13) - what could be the root cause
          Amin Masoudifard Expert

          So if it's not related to disks of your datastores, PSOD can be related to any of the following issues:

          1. In-Consistency of a specific advanced feature of physical CPU with ESXi version (I had this problem before, Hyper-Threading and ESXi 6.0 on Proliant DL380 G7)

          2. In-Consistency with Memory Configuration (It must be less related because if you have this issue, you must encounter with this problem more earlier, but not bad to check the HCL again)

          3. Any physical/logical problem in configured disk/datastore for the VMs and host as the cache/swap space for memory ballooning problem)

          4. Using old firmware version in your physical server (If you don't upgrade it for a too long duration)

          At last, if you can't find the real reason of your issue you should contact for technical support

          Please mark my comment as the Correct Answer if this solution resolved your problem
          • 17. Re: ESXi 6.7u3 PSoD Reoccurrences (Exception 13) - what could be the root cause
            Phel Novice

            I ended up rolling back to 6.7 flat, as I can't recall if this occurred before or after I upgraded to 6.7u3.

             

            Waiting to see if that makes a difference. If it does, then I know it is software related. If it doesn't, I can re-upgrade and begin testing hardware.

            • 18. Re: ESXi 6.7u3 PSoD Reoccurrences (Exception 13) - what could be the root cause
              Phel Novice

              Unfortunately, today, after rolling back, I received another PSoD.

               

              2019-11-10T06:08:27.043Z cpu2:2097656)ScsiDeviceIO: 2994: Cmd(0x459ab7868300) 0x2a, CmdSN 0x800e002f from world 2099954 to dev "t10.NVMe____Samsung_SSD_970_PRO_1TB_________________EA43B39156382500" failed H:0x0 D:0x28 P:0x0 Invalid sense data: 0x0

              2019-11-10T06:08:27.043Z cpu2:2097656)0x0 0x0.

              2019-11-10T06:08:27.043Z cpu2:2097656)ScsiDeviceIO: 2994: Cmd(0x459ab79b1f40) 0x2a, CmdSN 0x800e002f from world 2099954 to dev "t10.NVMe____Samsung_SSD_970_PRO_1TB_________________EA43B39156382500" failed H:0x0 D:0x28 P:0x0 Invalid sense data: 0x0

              2019-11-10T06:08:27.043Z cpu2:2097656)0x0 0x0.

              2019-11-10T06:08:27.043Z cpu2:2097656)ScsiDeviceIO: 2994: Cmd(0x459ab797c840) 0x2a, CmdSN 0x800e002f from world 2099954 to dev "t10.NVMe____Samsung_SSD_970_PRO_1TB_________________EA43B39156382500" failed H:0x0 D:0x28 P:0x0 Invalid sense data: 0x0

              2019-11-10T06:08:27.043Z cpu2:2097656)0x0 0x0.

              2019-11-10T06:08:27.043Z cpu2:2097656)ScsiDeviceIO: 2994: Cmd(0x459ab7877300) 0x2a, CmdSN 0x800e002f from world 2099954 to dev "t10.NVMe____Samsung_SSD_970_PRO_1TB_________________EA43B39156382500" failed H:0x0 D:0x28 P:0x0 Invalid sense data: 0x0

              2019-11-10T06:08:27.043Z cpu2:2097656)0x0 0x0.

              2019-11-10T06:08:27.043Z cpu2:2097656)ScsiDeviceIO: 2994: Cmd(0x459ab79a63c0) 0x2a, CmdSN 0x800e002f from world 2099954 to dev "t10.NVMe____Samsung_SSD_970_PRO_1TB_________________EA43B39156382500" failed H:0x0 D:0x28 P:0x0 Invalid sense data: 0x0

              2019-11-10T06:08:27.043Z cpu2:2097656)0x0 0x0.

              2019-11-10T06:08:27.043Z cpu2:2097656)ScsiDeviceIO: 2994: Cmd(0x459ab795d1c0) 0x2a, CmdSN 0x800e002f from world 2099954 to dev "t10.NVMe____Samsung_SSD_970_PRO_1TB_________________EA43B39156382500" failed H:0x0 D:0x28 P:0x0 Invalid sense data: 0x0

              2019-11-10T06:08:27.043Z cpu2:2097656)0x0 0x0.

              2019-11-10T06:08:27.043Z cpu1:2097691)nvme:NvmeCore_GetCmdInfo:895:Queue [1] Command List Empty, DumpProgress: Faulting world regs Faulting world regs (01/14)

              DumpProgress: Vmm code/data Vmm code/data (02/14)

              DumpProgress: Vmk code/rodata/stack Vmk code/rodata/stack (03/14)

              DumpProgress: Vmk data/heap Vmk data/heap (04/14)

              DumpProgress: PCPU PCPU (05/14)

              2019-11-13T16:42:10.029Z cpu15:2097730)Dump: 3185: Dumped 1796 pages of recentMappings

              DumpProgress: World-specific data World-specific data (06/14)

              DumpProgress: VASpace VASpace (08/14)

              2019-11-13T16:42:12.101Z cpu15:2097730)HeapMgr: 1021: Dumping HeapMgr region for pageSize 0 with 54638 PDEs.

              2019-11-13T16:42:20.673Z cpu15:2097730)HeapMgr: 1021: Dumping HeapMgr region for pageSize 0 with 0 PDEs.

              2019-11-13T16:42:20.673Z cpu15:2097730)XMap: 479: Dumping XMap region with 16384 PDEs.

              2019-11-13T16:42:25.350Z cpu15:2097730)VAArray: 799: Dumping VAArray region

              2019-11-13T16:42:25.354Z cpu15:2097730)Timer: 1420: Dumping Timer region with 8 PDEs.

              2019-11-13T16:42:25.416Z cpu15:2097730)FastSlab: 1062: Dumping FastSlab region with 32768 PDEs.

              2019-11-13T16:42:33.188Z cpu15:2097730)MPage: 707: Dumping MPage region

              2019-11-13T16:42:49.325Z cpu15:2097730)VAArray: 799: Dumping VAArray region

              2019-11-13T16:42:49.370Z cpu15:2097730)PShare: 3180: Dumping pshareChains region with 2 PDEs.

              2019-11-13T16:42:50.185Z cpu15:2097730)VASpace: 1218: VASpace "WorldStore" [451a00000 - 459a00001] had no registered dump handler.

              2019-11-13T16:42:50.185Z cpu15:2097730)VASpace: 1218: VASpace "vmkStats" [45aa40000 - 45ca40000] had no registered dump handler.

              2019-11-13T16:42:50.185Z cpu15:2097730)VASpace: 1218: VASpace "pageRetireBitmap" [45ca40000 - 45ca40204] had no registered dump handler.

              2019-11-13T16:42:50.185Z cpu15:2097730)VASpace: 1218: VASpace "pageRetireBitmapIdx" [45ca80000 - 45ca80001] had no registered dump handler.

              2019-11-13T16:42:50.185Z cpu15:2097730)VASpace: 1218: VASpace "llswap" [45cac0000 - 465b40000] had no registered dump handler.

              2019-11-13T16:42:50.185Z cpu15:2097730)VASpace: 1218: VASpace "LPageStatus" [465b40000 - 465b40041] had no registered dump handler.

              2019-11-13T16:42:50.185Z cpu15:2097730)Migrate: 394: Dumping Migrate region with 65536 PDEs

              2019-11-13T16:42:52.881Z cpu15:2097730)VASpace: 1218: VASpace "XVMotion" [467b80000 - 467ba0000] had no registered dump handler.

              DumpProgress: PFrame PFrame (09/14)

              2019-11-13T16:42:52.901Z cpu15:2097730)PFrame: 5499: Dumping PFrame region with 33018 PDEs

              DumpProgress: Dump Files Dump Files (12/14)

              DumpProgress: Collecting userworld dumps Collecting userworld dumps (13/14)

              DumpProgress: Finalized dump header Finalized dump header (14/14)

              A photo can be found below:

              IMG_20191113_095857.jpg

               

               

              I will start troubleshooting each device individually (memtest for memory), CPU stress tests, verifying firmware (I did upgrade the Firmware BIOS but it is not possible to downgrade this as far as I know).

               

              If all else fails, I will start replacing components unfortunately, or pay for a support contract.

              • 19. Re: ESXi 6.7u3 PSoD Reoccurrences (Exception 13) - what could be the root cause
                lukebes1010 Novice

                If the PSoD's started to appear after the update of the BIOS, that may be the exact reason, if not, i'd completely strip the host of all of the resource to a minimum before troubleshooting the hardware.

                1 2 Previous Next