1 Reply Latest reply on Aug 18, 2019 3:44 AM by a.p.

    numerous Intermittent Network connection timeout from VM Guest to production LAN, update from vsphere 6.5U2 to U3, caused disaster-datastore lost

    ESTHERLEE6880 Lurker

      facing numberous intermittent network connection timeout from VM guest to production LAN. we've logged numerous case to vmware support, and also update all the hardware driver/iLO as per advised by support.

      lastly we planned for update from vsphere 6.5U2 to U3. and disaster happened. the datastore was wiped out.... and need to restore from backup...

      Hardware : HPE Proliant DL380G10 (2 physical host)

      Original vSphere version & build : vSphere 6.5U2 , build 10884925

      Original vcenter version & build : vCenter 6.5 U2c, build 9451637

       

      here's the flow :

      1. The file downloaded : https://my.vmware.com/web/vmware/details?downloadGroup=OEM-ESXI65U3-HPE&productId=614(Download HPE Custom Image for ESXi 6.5U3 Install CD).
      2. Update plan - update to one of the ESXi host (namely host#1), monitor then only to proceed with host#2. Prior to update action, checking all done, vm all up & running, no error message.
      3. During the process of update, there is no error message or alert, all success, however after reboot, it required end user to use the existing ID & password to log in (to host#1) instead of normally is request for root ID password to be keyed in.
      4. Notice that after reboot, the build number of host#1 became build 9298722 (which means it was going backward instead forward)
      5. Suddenly the datastore unable to locate, log case to storage array, it was seen the storage array still have data

       

      logged case to array principal, was told no data lost. however after log case to vm support, the issue after they remote : A datastore lost all the data on it

      root cause : VMFS filesystem has been overridden by most probably a windows machine with FAT32 file system

       

      what our suspect is, software bug, as per describe in the flow.. no error message, no alert, nothing.

      need to cehck if anyone facing the same kind of issue ?

       

      and would like to know

      1. under what circumstances during update of the ESXi , the build number would reverse to even older version (not the same as previous build number)
      2. is there any impact of both esxi host vsphere 6.5 u2 build number is different?
      3. what prevention steps can be taken (suspect there is software bug in 6.5u3 that caused the disaster)
        • 1. Re: numerous Intermittent Network connection timeout from VM Guest to production LAN, update from vsphere 6.5U2 to U3, caused disaster-datastore lost
          a.p. Guru
          vExpertUser ModeratorsCommunity Warriors

          I unfortunately can't tell you what exactly happened, but it looks like there was an issue with the update (obviously), and ESXi reverted to a previously existing installation on the secondary bootbank, i.e. the version/build from which you updated before.

           

          I've been working with HPE hardware for many years, and here's how I usually update hosts if Update Manager isn't an option, or in small environments, where the hosts do not require additional drivers (ones which are not supplied with the vendor's image).

           

          Make sure that you download the correct Offline Bundle (.zip file), which is Gen9Plus in your case, and upload it to a folder on a datastore.

           

          Find out the profile name required for the update command:

          esxcli software sources profile list -d /vmfs/volumes/datastore/folder/offline-bundle.zip

          Run the upgrade/update (I recommend that you place the host into Maintenance Mode for the upgrade/update):

          esxcli software profile install -d /vmfs/volumes/datastore/folder/offline-bundle.zip -p <profilename> --ok-to-remove --dry-run

          Note: "--dry-run" can be used to evaluate what's going to happen without actually changing anything. To run the update, run the command without "--dry-run"

           

          Once the update is done (this may take about 20 sec on a SSD, or up to ~2 min on a SD card), verify that the command returned with s.th. like "The update completed successfully, but the system needs to be rebooted". If that's the case, reboot the host by typing "reboot" in the command line.

           

          If the host doesn't automatically connect to vCenter after the update (this may take a minute or so, after it has rebooted), then right click the host, and select Reconnect, which should then work without the need to enter credentials.

           

          André