kwg66
Hot Shot
Hot Shot

vSPhere 6.5 slow boot

Jump to solution

I am seeing boot times in excess of 20 minutes.  The boot process hands just after loading "vmw_vaii ip_hds"  for over 12 minutes, not sure if after loading vmw_vai ip_hds whether its executing some activities based on this module, or whether it is trying to load the next service, with is "gss".  I  haven't been able to find details as to what gss is.    Other than this 1 hang, the boot is fairly smooth..

I have read this may be because of the use of RDMs, but I'm not completely convinced. 

Can anyone shed light on what could be causing vSphere to take so long to boot, and how and if it can be mitigated in any way.

0 Kudos
1 Solution

Accepted Solutions
pwilk
Hot Shot
Hot Shot

Was the boot time always this slow or has it slowed down recently?

If it's a new issue in your environment, then this is certainly caused by RDMs. You can follow the guide here VMware Knowledge Base  to troubleshoot this

Cheers, Paul Wilk

View solution in original post

0 Kudos
6 Replies
pwilk
Hot Shot
Hot Shot

Was the boot time always this slow or has it slowed down recently?

If it's a new issue in your environment, then this is certainly caused by RDMs. You can follow the guide here VMware Knowledge Base  to troubleshoot this

Cheers, Paul Wilk
0 Kudos
msripada
Virtuoso
Virtuoso

When the process hangs at "vmw_vaii ip_hds", it indicates that it stopped at this process or the next process is still ongoing. You can press Alt F12 on DCUI console and then check live vmkernel logs on the activities but most of the time it might be stuck due to storage scan process and is explained in KB 1016106. I am sure that the live vmkernel log gives you some light or answers you are looking for on the slow boot issue

Thanks,

MS

kwg66
Hot Shot
Hot Shot

OK, I looked at the KB - horribly written, but clearly pointing to applying a setting to the LUN to make it  "--perennially-reserved=true"

I say the KB is horribly written because it doesn't really explain much about what this setting is doing, and what other ramifiations might exist as a result of making the change to --perennially-reserved=true

Before I apply this change to a production environment can you shed light on this change?  What its doing during boot and what the ramifications might be after the system is running live. 

Also, correct me if I'm wrong, but it looks as though this command needs to be run on every host in the cluster, and also for every RDM, is this the case?

0 Kudos
kwg66
Hot Shot
Hot Shot

Just tested our older hosts that see the same storage, the boot time is the same, very long, about 20 minutes.   (current prod is running on 5.5, just build out a new cluster of 6.5 and getting ready to migrate all workloads to it, both clusters see the same LUNs at this time until migrations are completed)

Followed link to KB, see below.

0 Kudos
msripada
Virtuoso
Virtuoso

Yes. The changes should be made on all hosts and for all the RDM's. You might also consider upgrading the drivers/firmware for the ESXi hosts for hba's or nic (is using iscsi) as per the compatibility guide.

Thanks,

MS

kwg66
Hot Shot
Hot Shot

I just ran a script to collect RDM info

added the info collected into a text file compiled a string of esxcli commands

ran the commands in the vSphere cli against a host that doesn't run any VMs, but it does see the LUNs used as RDMs since it belongs to the same storage group as all hosts in the cluster

everything was successful after dealing with the thumbprint issue connecting to the host

rebooted - wow the boot was so fast my head is still spinning

I will run this across all of my new hosts, but I am not going to run it against the older hosts where the VMs currently run that own the RDMs.  

Once all the VMs are migrated to the new cluster, will there by any impact due to the perennially reserved setting on the VMs that own the RDMs or the hosts that house the VMs that own the RDMs?

Thanks

0 Kudos