2 Replies Latest reply on Feb 13, 2020 12:12 AM by andvm

    Power-on reset Events

    andvm Enthusiast



      Am recently noticing frequent events on one of the hosts that forms part of a vSAN cluster.


      [vob.scsi.scsipath.por] Power-on Reset occurred on naa.xxxxxxxxxxxxxx


      These events refer to different disks and not just one and look to occur every few mins.


      I ssh'd to the specific host and typed esxcli van storage list and the disks match with the vSAN disks.


      Any other commands I should issue to check further or any recommended actions?


      vSAN Health and Server hardware look all green.


      Other servers in the cluster do not have such events.




        • 1. Re: Power-on reset Events
          TheBobkin Virtuoso
          vExpertVMware Employees

          Hello andvm


          As you mention that you are seeing multiple disks getting reset, there could be an issue on the controller, on the cache-tier device of one affected Disk-Group or on a single Capacity-tier device (if the DG is deduped).

          Can you attach/PM the vmkernel.log and output of vdq -Hi to see can we narrow this down?

          Other things to look for:

          - aberrant latency on some devices (e.g. grep -i increased /var/log/vmkernel.log OR esxtop 'u')

          - Indications of the controller driver/firmware having issues (dmesg OR less /var/log/vmkernel.log)

          - Validate that the controller and its driver+firmware are on the HCL - please note that checking the Health check for this being Green is not sufficient as this won't check devices that are not on the HCL at all (as it just assumes you are using them for non-vSAN purposes) - validate that it has checked the controllers in question.



          • 2. Re: Power-on reset Events
            andvm Enthusiast

            Hi TheBobkin


            My findings so far...



            HBA firmware/driver matches vSAN VCG


            Commands run and disks with power reset issue in logs match disks listed in vdq -Hi


            I do see a few events around the time the issue started relating to increased latency (Do not however particularly see high spikes in latency within VM/Backend Graphs)


            In vmkernel.log amongst others I see events related to lsi_mr3 megasas_hotplug_work etc... I will need to check further as seeing VMware Knowledge Base  might be related although servers running on ESXi 6.5U2


            I am thinking of placing the host in MM and reboot it, leave it running for a few days and see if get similar events and if so raise with GSS