VMware Cloud Community
ESTHERLEE6880
Contributor
Contributor

numerous Intermittent Network connection timeout from VM Guest to production LAN, update from vsphere 6.5U2 to U3, caused disaster-datastore lost

facing numberous intermittent network connection timeout from VM guest to production LAN. we've logged numerous case to vmware support, and also update all the hardware driver/iLO as per advised by support.

lastly we planned for update from vsphere 6.5U2 to U3. and disaster happened. the datastore was wiped out.... and need to restore from backup...

Hardware : HPE Proliant DL380G10 (2 physical host)

Original vSphere version & build : vSphere 6.5U2 , build 10884925

Original vcenter version & build : vCenter 6.5 U2c, build 9451637

here's the flow :

  1. The file downloaded : https://my.vmware.com/web/vmware/details?downloadGroup=OEM-ESXI65U3-HPE&productId=614(Download HPE Custom Image for ESXi 6.5U3 Install CD).
  2. Update plan - update to one of the ESXi host (namely host#1), monitor then only to proceed with host#2. Prior to update action, checking all done, vm all up & running, no error message.
  3. During the process of update, there is no error message or alert, all success, however after reboot, it required end user to use the existing ID & password to log in (to host#1) instead of normally is request for root ID password to be keyed in.
  4. Notice that after reboot, the build number of host#1 became build 9298722 (which means it was going backward instead forward)
  5. Suddenly the datastore unable to locate, log case to storage array, it was seen the storage array still have data

logged case to array principal, was told no data lost. however after log case to vm support, the issue after they remote : A datastore lost all the data on it

root cause : VMFS filesystem has been overridden by most probably a windows machine with FAT32 file system

what our suspect is, software bug, as per describe in the flow.. no error message, no alert, nothing.

need to cehck if anyone facing the same kind of issue ?

and would like to know

  1. under what circumstances during update of the ESXi , the build number would reverse to even older version (not the same as previous build number)
  2. is there any impact of both esxi host vsphere 6.5 u2 build number is different?
  3. what prevention steps can be taken (suspect there is software bug in 6.5u3 that caused the disaster)
Reply
0 Kudos
1 Reply
a_p_
Leadership
Leadership

I unfortunately can't tell you what exactly happened, but it looks like there was an issue with the update (obviously), and ESXi reverted to a previously existing installation on the secondary bootbank, i.e. the version/build from which you updated before.

I've been working with HPE hardware for many years, and here's how I usually update hosts if Update Manager isn't an option, or in small environments, where the hosts do not require additional drivers (ones which are not supplied with the vendor's image).

Make sure that you download the correct Offline Bundle (.zip file), which is Gen9Plus in your case, and upload it to a folder on a datastore.

Find out the profile name required for the update command:

esxcli software sources profile list -d /vmfs/volumes/datastore/folder/offline-bundle.zip

Run the upgrade/update (I recommend that you place the host into Maintenance Mode for the upgrade/update):

esxcli software profile install -d /vmfs/volumes/datastore/folder/offline-bundle.zip -p <profilename> --ok-to-remove --dry-run

Note: "--dry-run" can be used to evaluate what's going to happen without actually changing anything. To run the update, run the command without "--dry-run"

Once the update is done (this may take about 20 sec on a SSD, or up to ~2 min on a SD card), verify that the command returned with s.th. like "The update completed successfully, but the system needs to be rebooted". If that's the case, reboot the host by typing "reboot" in the command line.

If the host doesn't automatically connect to vCenter after the update (this may take a minute or so, after it has rebooted), then right click the host, and select Reconnect, which should then work without the need to enter credentials.

André

Reply
0 Kudos