1 2 Previous Next 16 Replies Latest reply on Jun 24, 2019 12:18 PM by oneilv

    ESXi Upgrade to 6.7 U1 causing VMs to restart

    oneilv Novice
    vExpert

      Hey guys,

       

      We have recently upgraded ESXi on all our servers in the cluster from ESXi 6.5 to ESXi 6.7 Update1 (EP7 build) and since then we are seeing issues on multiple VMs (linux vms) that are randomly getting restarted. These reboots were seen since the upgrade and also during vMotion of vms to other hosts during the upgrade. Even today, 4 days after the upgrade, some VMs are rebooting with the below error.

       

      The error we have seen in the events are

      vmware esx unrecoverable error (vcpu-2) vmk: unable to decompress BPN (I've attached a screenshot as well)

       

      Has anyone come across this?

       

      Cheers,

       

      Onil Varghese

        • 1. Re: ESXi Upgrade to 6.7 U1 causing VMs to restart
          ThompsG Master

          Hi oneilv,

           

          Are you able to attach the vmware.log from one of the Linux machines from when it restarted? Don't copy and paste here but actually attach the log please

           

          Also make sure it is from when the VM restarted.

           

          Kind regards.

          • 3. Re: ESXi Upgrade to 6.7 U1 causing VMs to restart
            oneilv Novice
            vExpert

            VM logs attached as requested ThompsG

            • 4. Re: ESXi Upgrade to 6.7 U1 causing VMs to restart
              pragg12 Enthusiast

              Hi,

               

              Are the ESXi hosts certified/supported for ESXi 6.7 U1?

              Are the hardware BIOS, hardware components' firmware/driver versions in line with 6.7 U1?

              Is the issue observed for VMs on a particular ESXi host or on all ESXi hosts in cluster ?

              What's the vm HW version in use on affected vms ?

              Any particular flavor of linux OS vm repeatedly facing issue or all linux vms facing issue ?

              Are there other OS vms and any weird issue reported on them ?

              • 5. Re: ESXi Upgrade to 6.7 U1 causing VMs to restart
                oneilv Novice
                vExpert

                Hey pragg12,

                 

                The hosts are certified and supported for ESXi 6.7 U1 and all BIOS. drivers and firmware were supported for 6.7 U1 as well.

                The last suggestion from GSS has been to update the drivers and firmwares to the latest available versions. We are now actioning this and will monitor the VMs to see if there is any changes.

                 

                I am still intrigued though as to why a old driver/ firmware would cause VMs to reboot especially the linux ones.

                 

                Keep you guys posted on this one.

                 

                Cheers, Onil

                • 6. Re: ESXi Upgrade to 6.7 U1 causing VMs to restart
                  oneilv Novice
                  vExpert

                  Hi Guys,

                   

                  After applying the drivers and upgrading firmware on the hosts, we are still seeing the VMs being sporadically restarted by HA. The case has been escalated to a P1 with GSS and after a couple of phone calls with Gas they have confirmed that there are 5 other customers reporting the same issue with other hardware vendors. We have other clusters in the same environment that is not impacted and other customers who are running vSphere 6.7 U1 and they are not impacted by this issue.

                  Gss are still working on root cause but it looks like it could be due to an issue when VMs are migrated from ESXi 6.5 to ESXi 6.7.

                   

                  Further updates to follow

                   

                  Cheers, Onil

                  1 person found this helpful
                  • 7. Re: ESXi Upgrade to 6.7 U1 causing VMs to restart
                    oneilv Novice
                    vExpert

                    Hey Guys,

                     

                    An update on this issue - Engineering have an ESXi patch that provides additional debugging that they'd like to apply. Of the 6 SRs for this issue, one customer has applied the patch above and 4 have reverted their environments to 6.5.

                     

                    Further updates to follow.

                     

                    Cheers, Onil

                    • 8. Re: ESXi Upgrade to 6.7 U1 causing VMs to restart
                      pragg12 Enthusiast

                      Thanks for keeping the thread alive. Looking forward to see what Engineering team finds here.

                      • 9. Re: ESXi Upgrade to 6.7 U1 causing VMs to restart
                        oneilv Novice
                        vExpert

                        Hey guys,

                         

                        One thing I've confirmed with the team here is that all the VMs that are getting restarted, are running virtual hardware versions 10 and below. Not sure if this

                        This information is being sent to the VMware engineering team to be added into the RCA.

                         

                        Cheers, Onil

                        • 10. Re: ESXi Upgrade to 6.7 U1 causing VMs to restart
                          pragg12 Enthusiast

                          Your response has invoked more queries from me:

                           

                          Have you done a test by upgrading the affected VM's HW version to 13 or above and then see if the issue still occurs ?

                          When you updated the ESXi from 6.5 to 6.7, did the cluster EVC settings were changed or have you enabled VM based EVC after which you started seeing this issue ?

                          • 11. Re: ESXi Upgrade to 6.7 U1 causing VMs to restart
                            oneilv Novice
                            vExpert

                            Hi pragg12

                             

                            The restart issue is occurring only once on every VM. So far we've had 90+ VMs restart but not one VM has been restarted more than once.

                            VMware engineering have confirmed that the Crash / backtrace for all the issues reported are same with the memory fault and it has nothing to do with the Hardware version.

                             

                            VMware GSS have also provided a patch for ESXi which we will rolling out to the cluster tonight. If the pattern continues then VMs will continue to get reset which will allow us to get the additional information for engineering to further look into the issue.

                             

                            More updates to follow.

                             

                            Cheers, Onil

                            • 12. Re: ESXi Upgrade to 6.7 U1 causing VMs to restart
                              pragg12 Enthusiast

                              Hi oneilv

                               

                              Do you have any further updates on this issue ?

                              • 13. Re: ESXi Upgrade to 6.7 U1 causing VMs to restart
                                oneilv Novice
                                vExpert

                                Hi pragg12,

                                 

                                The patch provided by VMware Gss has been applied to the cluster and after applying the patches we are still seeing VMs being reset by HA. Logs for the corresponding VMs and hosts have been uploaded to the engineering team and they are currently reviewing them.

                                 

                                We hope to hear back from the soon and I will post some updates as soon as I hear back.

                                 

                                Cheers, Onil

                                • 14. Re: ESXi Upgrade to 6.7 U1 causing VMs to restart
                                  oneilv Novice
                                  vExpert

                                  Hey all,

                                   

                                  Just an update on this case - The customer had requested us to roll back half the cluster to 6.5 as they could not tolerate further VMs crashing.

                                   

                                  VMware GSS provided us with a patch for 6.7 (debug patch) which has further logging capabilities that they need to investigate the issue further however all our attempts to install this one host failed and the build number on the host wasn't changing. After multiple phone calls with GSS to resolve this, VMware engineering have now supplied us another image which has actually worked and the build number is now updated on the host. Unfortunately its been so long and the customer has not experienced any VM HA reset events in the last 2 weeks now.

                                   

                                  We are sill planning to roll this out to all 6.7 hosts this week and if we encounter the issue again we will upload logs to GSS.

                                   

                                  Cheers,

                                   

                                  Onil

                                  1 2 Previous Next