aaronwsmith's Posts

Question: For the ixgbe driver, why would the Rx Max change from 4096 to 240 when MTU is changes from 1500 to 9000?  Wouldn't that make the Rx buffer ineffective to handle Jumbo Frames? We hav... See more...
Question: For the ixgbe driver, why would the Rx Max change from 4096 to 240 when MTU is changes from 1500 to 9000?  Wouldn't that make the Rx buffer ineffective to handle Jumbo Frames? We have two clusters with the same hardware specs, built with the same ESXi image, same ixgbe async driver, and the Intel 82599 10 Gbps NIC has the same firmware version installed.  But in one cluster, ethtool indicates the Rx maximum cannot be set above 240, and the other indicates 4096.  I found the reason is the MTU size.  On one host where Rx Max == 4096, MTU == 1500.  Whereas on the other host where Rx Max == 240, MTU == 9000. ** Example host where I get option to set Rx maximum to 4096: [~] ethtool -g vmnic0 Ring parameters for vmnic0: Pre-set maximums: RX:             4096 RX Mini:        0 RX Jumbo:       0 TX:             4096 Current hardware settings: RX:             240 RX Mini:        0 RX Jumbo:       0 TX:             1024 [~] ethtool -i vmnic0 river: ixgbe ersion: 4.5.1-iov irmware-version: 0x800007f4, 17.5.10 us-info: 0000:01:00.0 [~] esxcli network nic list Name    PCI Device    Driver  Admin Status  Link Status  Speed  Duplex  MAC Address         MTU  Description ------  ------------  ------  ------------  -----------  -----  ------  -----------------  ----  ------------------------------------------------------ vmnic0  0000:01:00.0  ixgbe   Up            Up           10000  Full    ec:f4:bb:xx:xx:xx  1500  Intel(R) 82599 10 Gigabit Dual Port Network Connection vmnic1  0000:01:00.1  ixgbe   Up            Up           10000  Full    ec:f4:bb:xx:xx:xx  1500  Intel(R) 82599 10 Gigabit Dual Port Network Connection [~] vmware -l VMware ESXi 6.0.0 Update 3 ** Example host where I don't seem to have the ability to set Rx maximum above 240: [~] ethtool -g vmnic0 Ring parameters for vmnic0: Pre-set maximums: RX:             240 RX Mini:        0 RX Jumbo:       0 TX:             4096 Current hardware settings: RX:             240 RX Mini:        0 RX Jumbo:       0 TX:             1024 [~] ethtool -i vmnic0 driver: ixgbe version: 4.5.1-iov firmware-version: 0x800007f4, 17.5.10 bus-info: 0000:01:00.0 [~] esxcli network nic list Name    PCI Device    Driver  Admin Status  Link Status  Speed  Duplex  MAC Address         MTU  Description ------  ------------  ------  ------------  -----------  -----  ------  -----------------  ----  ------------------------------------------------------ vmnic0  0000:01:00.0  ixgbe   Up            Up           10000  Full    ec:f4:bb:xx:xx:xx  9000  Intel(R) 82599 10 Gigabit Dual Port Network Connection vmnic1  0000:01:00.1  ixgbe   Up            Up           10000  Full    ec:f4:bb:xx:xx:xx  9000  Intel(R) 82599 10 Gigabit Dual Port Network Connection [~] vmware -l VMware ESXi 6.0.0 Update 3 On the host where Rx Max == 240, if I changed the MTU from 9000 --> 1500, the Rx Max changes from 240 --> 4096.  Just an FYI as it took me a while to figure out this behavior. But I'm unsure then how the Rx ring buffer is useful for Jumbo Frames at 240 bytes?
We just succeeded in getting SSPI to work such that PowerCLI 6.5 against vCenter 6.5 (in our case with an external PSC) to pass-through the Kerberos credentials. 1. If PSC is external, ensure ... See more...
We just succeeded in getting SSPI to work such that PowerCLI 6.5 against vCenter 6.5 (in our case with an external PSC) to pass-through the Kerberos credentials. 1. If PSC is external, ensure it's joined to AD -- Join the vCenter Server Appliance to an Active Directory Domain​.  Reboot the PSC appliance. 2. For the vCenter Appliance, you must also join it to AD via the CLI (Only if PSC is External) -- The option to join vCenter Server Appliance 6.x to an Active Directory domain is unavailable in the vSphere Web Client (… 3. If the domain you're joining differs from the FQDN of the vCenter, you'll need to create a matching Service Principal Name (SPN) for the vCenter's Computer Account.  Otherwise SSPI will fail to create a security context to perform the login to the machine account to pass-through your credentials. In our case, #3 was the missing piece.  Our vCenter was in a separate DNS domain (xxx.umn.edu) from AD (yyy.umn.edu.)  By default 2 SPNs are created under the Computer Account in AD (at least in our case): <vCenter-Hostname> <vCenter-Hostname>.<yyy.umn.edu -- The AD Domain> So for us, a 3rd SPN was needed: <vCenter-Hostname>.<xxx.umn.edu -- Our Separate DNS Domain> Easiest to add the missing SPN from the command-line on the Domain Controller (or any Windows machine with the AD Tools installed/enabled): setspn -A "HOST/<vCenter-Hostname>.<domain-name>" <vCenter-HostName> Example: setspn -A "HOST/myvCenter.xxx.umn.edu" myvCenter Then list the SPNs associated with the Computer Account to confirm: setspn -l <AD-Domain>\<vCenter-Hostname> Example: setspn -l yyy.umn.edu\myvCenter 4. Reboot the vCenter Appliance.  This will ensure there is sufficient time for the AD Domain Controllers to replicate the new Computer Account + it's custom SPN addition. Hope this helps!
I just resolved this issue in our environment.  The ESXi hosts had external-CA signed certificates installed in /etc/vmware/ssl/rui.crt.  But /etc/vmware/ssl/castore.pem file was empty (upgraded ... See more...
I just resolved this issue in our environment.  The ESXi hosts had external-CA signed certificates installed in /etc/vmware/ssl/rui.crt.  But /etc/vmware/ssl/castore.pem file was empty (upgraded host from ESXi 5.5 -> 6.x.)  It needed the base-64/PEM encoded certificates of the CA (and in our case also the intermediate-CA chained together) added to this file.  You can chain them together in the same file.  Once I did this on all ESXi hosts in the cluster, the stats started collecting. The single ESXi host that is NOT listed as having issues contributing stats is the designated stats master for the cluster.  The RVC command vsan.perf.cluster_info can show you who the stats and CMMDS masters are in the cluster, and whether any issues exist with stats collection.  In this case, no issues were identified, but stats couldn't be shared with the master because SSL certificates from each host couldn't be verified without the CA certs residing in the above mentioned castore.pem file. I'll publish a blog with details as soon as I can.  Meanwhile hope this quick-summary fix helps everyone here.
I had the same issue.  Took me a while but we finally found the solution ... PSC/SSO certificate expired, and psc-client (which hosts the /psc URL) refused to connect as a result.  In our case, t... See more...
I had the same issue.  Took me a while but we finally found the solution ... PSC/SSO certificate expired, and psc-client (which hosts the /psc URL) refused to connect as a result.  In our case, this "ssoserver" certificate was replaced in vCenter 5.5 with a CA-signed version, then during the upgrade to vCenter 6, carried over and ultimately expired.  Wrote a blog explaining the backstory, findings and resolution.  See if this is applicable for you: Troubleshooting Expired PSC Certificates with vSphere 6 | Virtually Understood Hope this helps!
Wow this is unfortunate for the workaround, but greatly appreciate the link to the release notes where the issues is documented as a known problem!  If nothing else, it at least puts the issue to... See more...
Wow this is unfortunate for the workaround, but greatly appreciate the link to the release notes where the issues is documented as a known problem!  If nothing else, it at least puts the issue to rest with trying to resolve it.  We work-around the problem by logging in with the administrator@vsphere.local account when we need to see this information under System Configuration, which is a pain but better than nothing.  Thanks!
Thanks for the update Darren!  I'll confirm this resolves the issue for us once we upgrade to 6.0U2 in our VSAN environment.
I figured out the problem, and documented the findings in my blog post here: Troubleshooting Expired PSC Certificates with vSphere 6 | Virtually Understood
You can review this KB which provides all the ports for connectivity between components in vSphere 6: https://kb.vmware.com/kb/2131180 However, if you look you'll need to block inbound HTTP/4... See more...
You can review this KB which provides all the ports for connectivity between components in vSphere 6: https://kb.vmware.com/kb/2131180 However, if you look you'll need to block inbound HTTP/443 from client devices to prevent the C# client from working.  Problem is, this would also block connectivity to the reverse proxy on vCenter, which essentially would prohibit access to many HTTPS facilities in vCenter. Perhaps you should approach this differently ... VMware is going to (finally) terminate the C# client, and 40%+ customers have deployed the HTML5 web client fling into production.  So you could try the HTML5 client and show users that C# is not going to be available long term: Goodbye vSphere Client for Windows (C#) – Hello HTML5 - VMware vSphere Blog - VMware Blogs
Have you increase vCenter's log level to verbose to help troubleshoot? VMware KB: Increasing VMware vCenter Server and VMware ESX/ESXi logging levels Guessing this may have been tried by GSS,... See more...
Have you increase vCenter's log level to verbose to help troubleshoot? VMware KB: Increasing VMware vCenter Server and VMware ESX/ESXi logging levels Guessing this may have been tried by GSS, but if not you could see if increasing the heartbeat timeout between vCenter and ESXi hosts would help resolve the issue: https://kb.vmware.com/kb/1005757 (note the symptoms described in this KB require verbose logging for vCenter to identify the missed heartbeat messages.) Is your vMotion and Management vmk# ports on separate uplinks?  Or is it shared?  I've seen on 1 Gbps links if vMotion + Management traffic is shared on the same wire, multiple vMotions (for example from putting a host in maintenance mode) will cause ESXi hosts to drop because the uplink is congested from vMotion traffic, preventing heartbeat UDP packets from getting through.
Hi Everyone, We have 2 Windows based vCenters in our environment, with the following Windows + vCenter version configuration: Windows Server 2008 R2 with vCenter 6.0 Update 1 (Embedded PSC ... See more...
Hi Everyone, We have 2 Windows based vCenters in our environment, with the following Windows + vCenter version configuration: Windows Server 2008 R2 with vCenter 6.0 Update 1 (Embedded PSC Architecture) Windows Server 2012 (Not R2) with vCenter 6.0 Update 2 (Embedded PSC Architecture) On both of these vCenter, we've replaced the machine certificated with 3rd party CA signed certificates (in the case of the vCenter 6.0U2, this was done while at version 6.0U1.) For some reason, when trying to apply fixes with broken SSL trust anchors documented in VMware KB: vCenter Server or Platform Services Controller certificate validation error for external VMware Solutions… , when accessing the lookup service's MOB interface @ https://<vcenter-FQDN>/lookupservice/mob, I'm repeatedly prompted for credentials to login.  I've tried the administrator@vsphere.local account as well as AD integrated accounts that have admin access in SSO.  In both cases these IDs can login to the PSC / vCenter fine, but nothing seems to authenticate with the MOB interface for the lookup service. On both vCenters as well, if I try to access the PSC web interface via https://<vcenter-FQDN>/psc, I receive an error: HTTP Status 400 - An error occurred while sending an authentication request to the PSC Single Sign-On server - null type Status report message An error occurred while sending an authentication request to the PSC Single Sign-On server - null description The request sent by the client was syntactically incorrect. I am confident both symptoms are related, as we have other Windows based vCenters running 6.0 Update 1 where this is not an issue.  However, I've been at a loss to identify the cause of find any logs that correlate to the errors for the PSC web interface or failed authentications to the MOB. Would greatly appreciate any tips or guidance on how to solve this issue.  Without being able to login to the MOB, we're unable to resolve SSL certificate issues that I described above.  Thanks much in advance!
For anyone searching the web for the elusive VSAN Health Check Plugin 503 error, note another possible cause discussed in this thread: Re: Problem with VSAN Health Check windows server (Unexpect... See more...
For anyone searching the web for the elusive VSAN Health Check Plugin 503 error, note another possible cause discussed in this thread: Re: Problem with VSAN Health Check windows server (Unexpected status code: 503) Dell OpenManage integration seems to break VSAN Health Check, in my experience it was true for VSAN 6.1 atop vCenter on Windows and vCenter Appliance.  Unregistering OpenManage from the affected vCenters in addition to verifying the CA certificates via 3 KBs I noted in my reply to the above discussion resolved the issue for us.
Thanks much for this tip!  We had this problem with 2 of our VSAN 6.1 enabled vCenters.  Initially the problem was the CA Certificate Replacement issues documented in the following 3 KBs: Embe... See more...
Thanks much for this tip!  We had this problem with 2 of our VSAN 6.1 enabled vCenters.  Initially the problem was the CA Certificate Replacement issues documented in the following 3 KBs: Embedded PSC: https://kb.vmware.com/kb/2121689 VSAN: https://kb.vmware.com/kb/2128353 ESX Agent Manager: https://kb.vmware.com/kb/2112577 But we also had Dell OpenManager integrated into both vCenters and were continuing to see the HTTP 503 error.  After unregistering the vCenters from Dell OpenManager Admin portal and restarting the VSAN Health Check service, it began to work again!
When I login to our vCenter 6 Update 1 Appliance (embedded PSC) via the Web Client with an Active Directory Integrated ID, select "System Configuration", then select the vCenter node, the summary... See more...
When I login to our vCenter 6 Update 1 Appliance (embedded PSC) via the Web Client with an Active Directory Integrated ID, select "System Configuration", then select the vCenter node, the summary screen does not display any IP or Hostname information.  Additionally, uptime and workload status for storage/memory/swap/workload all display "Unknown."  See attached picture for reference. Any values under Manage -> Settings are also blank. When I login with administrator@vsphere.local, all the above mentioned values are displayed and I can modify the settings for the vCenter Appliance as expected.  This made me suspect a permission issue exists, but I'm unable to figure out which SSO group to add our AD accounts under to enable management/monitoring of the vCenter appliance to make this work as expected. Already tried adding our accounts to the following groups in SSO under the vsphere.local domain: DCAdmins SystemConfiguration.Administrators ComponentManager.Administrators Administrators This information on each group's purpose unfortunately did not help me understand which one(s) might be applicable: Groups in the vsphere.local Domain Thank you very much for the help!
Hi, Anyone know if there's published guides on supported Firmware/BIOS stacks for Dell Hardware running ESXi 5.5? (e.g. M620.)  I know HP does this with their solution recipe guides that tie t... See more...
Hi, Anyone know if there's published guides on supported Firmware/BIOS stacks for Dell Hardware running ESXi 5.5? (e.g. M620.)  I know HP does this with their solution recipe guides that tie together ESXi builds/drivers with SPP firmware sets to create supported configurations, but I've yet to find anything from Dell.  Any links/guidance on this would be outstanding!  Thanks in advance!
Here's a quick helper script I wrote that helps generate a report of specific properties of SCSI LUNs attached to a VMHost.  Script supports 2 parameters, one for the VMHost and another for the H... See more...
Here's a quick helper script I wrote that helps generate a report of specific properties of SCSI LUNs attached to a VMHost.  Script supports 2 parameters, one for the VMHost and another for the HBA type you wish to target.  Lots of room for improvement (e.g. parameter to target an HBA by name.)  Main goal was to produce a report that ties together the LUN ID, Canonical Name and Console Device Name on a per-host basis. Script makes 2 assumptions: 1. PowerShell already has the PowerCLI Snap-in loaded. 2. You're already connected to a VI-Server. Hope this adds some value to the community!  Enjoy!
OscarDavey, wouldn't that require the OS to be able to mount/read the VMFS itself?  Or are you assuming that's been done.  Either way, for FC attached storage in this case, the lpfc820 driver is ... See more...
OscarDavey, wouldn't that require the OS to be able to mount/read the VMFS itself?  Or are you assuming that's been done.  Either way, for FC attached storage in this case, the lpfc820 driver is required to recognize the HBAs on the server.  Otherwise without that driver, it sees nothing.
Hey continuum, will do soon.  Thanks!  Lower priority at this point for me, but still interested in exploring the idea.
lvaibhavt, this KB instructs on how to install patches via ESXi command line: VMware KB: Installing patches on an ESXi 5.x host from the command line Ignore the steps in the KB that instruct ... See more...
lvaibhavt, this KB instructs on how to install patches via ESXi command line: VMware KB: Installing patches on an ESXi 5.x host from the command line Ignore the steps in the KB that instruct you to migrate/shutdown VMs and put host into maintenance mode.  Given the KB for the patch indicates that is not required, you can skip it.  Ultimately, when you proceed to install the patch, if a reboot/maintenance mode is required, it should indicate such and abort the install. But you may need to migrate the VM off and back onto the host to refresh the tools status for the VM.  Either way once the ESXi host has been patched, you need to update tools within the VM itself, which may require a reboot of that VM. Sorry for the significantly delayed response.
No, this specific issue is not linked to application hangs within a VM to my knowledge.  More likely the hang is being caused by an issue with the application itself, or an issue with the Guest O... See more...
No, this specific issue is not linked to application hangs within a VM to my knowledge.  More likely the hang is being caused by an issue with the application itself, or an issue with the Guest OS within the VM.
Are you sure the server rebooted because of this MCE?  This implies ESXi displayed a purple diagnostic screen with exception 18 displayed. VMware KB: Interpreting an ESX/ESXi host purple diagnos... See more...
Are you sure the server rebooted because of this MCE?  This implies ESXi displayed a purple diagnostic screen with exception 18 displayed. VMware KB: Interpreting an ESX/ESXi host purple diagnostic screen Did you get a screenshot of the PSOD?  If not, I assume ASR rebooted your machine.  I recommend ASR be disabled so you can capture the PSOD information, allow the host sufficient time to generate its core dump, and ensure a potentially unhealthy ESXi host does not rejoin the cluster automatically. VMware KB: HP Automatic Server Recovery (ASR) in an ESX environment Here is a KB to help decode a MCE after a PSOD: VMware KB: Decoding Machine Check Exception (MCE) output after a purple screen error Some MCEs on HP servers are benign or correctable.  Though for you looks like this wasn't the case.   But to make my point here, for example, in the BIOS, depending on which CPU you have installed, there may be a setting under Power Management -> Advanced Power Management Options -> SMI Link Power management.  Per HP for this setting: "Allows the user to disable power management on the Intel Scalable Memory Interconnect (SMI) link. Disabling this functionality will increase the server’s idle power usage.  While corrected events are considered normal and are expected on the SMI Link and do not affect operation of the platform, the occurrence of these corrected events can be reduced significantly by disabling SMI Link Power Management.  These events are logged as correctable Machine Check Bank 8 and 9 errors in the operating system logs for certain operating systems.  While these events can be ignored, SMI Link Power Management can be disabled to reduce or prevent their occurrence if desired." Hope this information is helpful.