cesprov's Posts

Seeing a strange issue with VMware Tools here that I don't see anyone else talking about in respect to the Spectre variant 2 vulnerability.  I have a vSphere environment that is fully patched thr... See more...
Seeing a strange issue with VMware Tools here that I don't see anyone else talking about in respect to the Spectre variant 2 vulnerability.  I have a vSphere environment that is fully patched through the 6.0U3e vCenter and the ESXi 3/20/18 patches.  Within that environment, I have a bunch of CentOS 6.9 servers.  Now we have slacked off for quite some time on upgrading the VMware Tools within these CentOS VMs so most are still running a 5.1.x version, 9.10.1.47876 (build-2791197).  These VMs are fully patched with the latest kernel so they shouldn't be vulnerable to Meltdown or Spectre.  Prior to upgrading the VMware Tools, the Meltdown/Spectre check script, found here, reports it has not being vulnerable to all three, as shown below: CVE-2017-5753 [bounds check bypass] aka 'Spectre Variant 1' * Mitigated according to the /sys interface:  YES  (kernel confirms that the mitigation is active) * Kernel has array_index_mask_nospec (x86):  NO * Kernel has the Red Hat/Ubuntu patch:  YES * Kernel has mask_nospec64 (arm):  NO > STATUS:  NOT VULNERABLE  (Mitigation: Load fences) CVE-2017-5715 [branch target injection] aka 'Spectre Variant 2' * Mitigated according to the /sys interface:  YES  (kernel confirms that the mitigation is active) * Mitigation 1   * Kernel is compiled with IBRS/IBPB support:  YES   * Currently enabled features     * IBRS enabled for Kernel space:  NO     * IBRS enabled for User space:  NO     * IBPB enabled:  NO * Mitigation 2   * Kernel has branch predictor hardening (arm):  NO   * Kernel compiled with retpoline option:  YES   * Kernel compiled with a retpoline-aware compiler:  YES  (kernel reports full retpoline compilation) > STATUS:  NOT VULNERABLE  (Mitigation: Full retpoline) CVE-2017-5754 [rogue data cache load] aka 'Meltdown' aka 'Variant 3' * Mitigated according to the /sys interface:  YES  (kernel confirms that the mitigation is active) * Kernel supports Page Table Isolation (PTI):  YES  (found 'CONFIG_PAGE_TABLE_ISOLATION=y') * PTI enabled and active:  YES * Running as a Xen PV DomU:  NO > STATUS:  NOT VULNERABLE  (Mitigation: PTI) Then if I upgrade to the latest version of VMware Tools for 6.0U3, 10.1.10.63510 (build-6082533), and re-run the script, it now shows the VM is vulnerable to Spectre variant 2: CVE-2017-5753 [bounds check bypass] aka 'Spectre Variant 1' * Mitigated according to the /sys interface:  YES  (kernel confirms that the mitigation is active) * Kernel has array_index_mask_nospec (x86):  NO * Kernel has the Red Hat/Ubuntu patch:  YES * Kernel has mask_nospec64 (arm):  NO > STATUS:  NOT VULNERABLE  (Mitigation: Load fences) CVE-2017-5715 [branch target injection] aka 'Spectre Variant 2' * Mitigated according to the /sys interface:  NO  (kernel confirms your system is vulnerable) * Mitigation 1   * Kernel is compiled with IBRS/IBPB support:  YES   * Currently enabled features     * IBRS enabled for Kernel space:  NO     * IBRS enabled for User space:  NO     * IBPB enabled:  NO * Mitigation 2   * Kernel has branch predictor hardening (arm):  NO   * Kernel compiled with retpoline option:  YES   * Kernel compiled with a retpoline-aware compiler:  UNKNOWN > STATUS:  VULNERABLE  (Vulnerable: Retpoline with unsafe module(s)) CVE-2017-5754 [rogue data cache load] aka 'Meltdown' aka 'Variant 3' * Mitigated according to the /sys interface:  YES  (kernel confirms that the mitigation is active) * Kernel supports Page Table Isolation (PTI):  YES  (found 'CONFIG_PAGE_TABLE_ISOLATION=y') * PTI enabled and active:  YES * Running as a Xen PV DomU:  NO > STATUS:  NOT VULNERABLE  (Mitigation: PTI) Using this to determine what the "unsafe module(s)" are shows: VULNERABLE - No Retpoline found - vsock VULNERABLE - No Retpoline found - vmci Obviously these are VMware Tools components.  These were not reported as a problem with VMware Tools 9.10.1.47876 (build-2791197) but are being reported with 10.1.10.63510 (build-6082533).  What's the deal?  Are these VMs that are reporting those two components really vulnerable?  And if so, when can we expect VMware to fix this?
The link in my post is the same KB you posted and does not apply here.  The 3010 is not occurring on any of the files mentioned in that article.  The "in use" files are python.exe and python27.dl... See more...
The link in my post is the same KB you posted and does not apply here.  The 3010 is not occurring on any of the files mentioned in that article.  The "in use" files are python.exe and python27.dll and aren't loaded into memory until the install gets to the point of "Vmware python components", at which point the vmware-python.msi or some sub-process of it loads the files into memory and the MSI then trips over itself detecting the files that it just loaded as being in use. Let me stress, I am not looking for help here.  I KNOW what caused the problem and in my case it is definitive.  This is more or less a PSA for anyone else that runs into this problem.
Been trying to upgrade to 6.0U3e (from 6.0U3d) all morning and it is constantly failing on the "VMware python components" right at the beginning of the install with the dreaded 3010 error.  I've ... See more...
Been trying to upgrade to 6.0U3e (from 6.0U3d) all morning and it is constantly failing on the "VMware python components" right at the beginning of the install with the dreaded 3010 error.  I've upgraded this vCenter countless times so I am familiar with all the files that get locked and cause 3010s (see VMware Knowledge Base).  This is the first time I have seen the 3010 appear on the Python components part though.  Prior to the install, none of the python files from C:\Program Files\VMware\vCenter Server\python are open in memory, as seen through Process Explorer.  While the "VMware Python components" section is displaying in the upgrade, if you refresh a search for open "python" files in Process Explorer, you can clearly see that the install itself has now opened quite a few files in C:\Program Files\VMware\vCenter Server\python.  The install then crashes with a 3010 error.  Checking the pkgmgr-comp-msi.log afterwards, you see these errors occurred: MSI (s) (A8:48) [12:49:37:515]: Executing op: RegisterSharedComponentProvider(,,File=python27.dll,Component={8C4A7097-E243-54DF-82C4-F45CA52DE496},ComponentVersion=2.7.13150.1013,ProductCode={DDB1C8B6-82DB-4FD5-A7AB-B1DA7B31E771},ProductVersion=6.0.0,PatchSize=0,PatchAttributes=0,PatchSequence=0,SharedComponent=0,IsFullFile=0) MSI (s) (A8:48) [12:49:37:531]: Executing op: FileCopy(SourceName=python27.dll,SourceCabKey=python27.dll,DestName=python27.dll,Attributes=1536,FileSize=3046328,PerTick=65536,,VerifyMedia=1,,,,,CheckCRC=0,Version=2.7.13150.1013,Language=0,InstallMode=126353408,,,,,,,) MSI (s) (A8:48) [12:49:37:547]: File: C:\Program Files\VMware\vCenter Server\python\python27.dll; Overwrite; Won't patch; Existing file is of an equal version MSI (s) (A8:48) [12:49:37:547]: Source for file 'python27.dll' is compressed MSI (s) (A8:48) [12:49:37:547]: Re-applying security from existing file. MSI (s) (A8:48) [12:49:37:547]: Note: 1: 2205 2:  3: Error MSI (s) (A8:48) [12:49:37:547]: Note: 1: 2228 2:  3: Error 4: SELECT `Message` FROM `Error` WHERE `Error` = 1603 MSI (s) (A8:48) [12:49:37:547]: Product: VMware-python. The file C:\Program Files\VMware\vCenter Server\python\python27.dll is being used by the following process: Name: python , Id 5368. MSI (s) (A8:48) [12:49:37:547]: Verifying accessibility of file: python27.dll Info 1603.The file C:\Program Files\VMware\vCenter Server\python\python27.dll is being held in use.  Close that application and retry. MSI (s) (A8:48) [12:49:36:938]: Executing op: FileCopy(SourceName=python.exe,SourceCabKey=python.exe,DestName=python.exe,Attributes=1536,FileSize=42936,PerTick=65536,,VerifyMedia=1,,,,,CheckCRC=0,,,InstallMode=126353408,HashOptions=0,HashPart1=269396038,HashPart2=-283905172,HashPart3=994898176,HashPart4=-1453880127,,) MSI (s) (A8:48) [12:49:36:938]: File: C:\Program Files\VMware\vCenter Server\python\python.exe; Overwrite; Won't patch; Existing file is unversioned and unmodified - hash doesn't match source file MSI (s) (A8:48) [12:49:36:938]: Source for file 'python.exe' is compressed MSI (s) (A8:48) [12:49:36:938]: Re-applying security from existing file. MSI (s) (A8:48) [12:49:36:938]: Note: 1: 2205 2:  3: Error MSI (s) (A8:48) [12:49:36:938]: Note: 1: 2228 2:  3: Error 4: SELECT `Message` FROM `Error` WHERE `Error` = 1603 MSI (s) (A8:48) [12:49:37:375]: Product: VMware-python. The file C:\Program Files\VMware\vCenter Server\python\python.exe is being used by the following process: Name: python , Id 5368. MSI (s) (A8:48) [12:49:37:375]: Verifying accessibility of file: python.exe Info 1603.The file C:\Program Files\VMware\vCenter Server\python\python.exe is being held in use. Close that application and retry. It's pretty clear that the installation is creating its own locks on the files in C:\Program Files\VMware\vCenter Server\python and thus causing the 3010 errors.  The only way I was able to get around this problem was to launch the install and before it gets to the "VMware Python components" part (which is right near the start), quick rename the entire C:\Program Files\VMware\vCenter Server\python folder.  The install then seems to completely skip over that component as it does not put the C:\Program Files\VMware\vCenter Server\python folder back, so before the install finishes, you need to quick rename C:\Program Files\VMware\vCenter Server\python back again.  This was not a one-off as it took me 6 install attempts to debug what was actually causing this problem.  The sad thing is that if you manually install X:\vCenter-Server\Packages\VMware-python.msi, the files in it are exactly the same as those installed by 6.0U3d so it's not even necessary that this package be reinstalled.  If you run into this problem coming from anything besides 6.0U3d, you may need to manually install X:\vCenter-Server\Packages\VMware-python.msi. I know VMware has effectively given up on the Windows vCenter, but the lack of QA on these releases is astounding.  I mean seriously, when the 3010 error appears, at least give a retry option so you don't have to wait an hour for the entire f'in install to roll back before you can attempt to fix the problem.
For the record, this was not yet publicly available when I posted this.  :smileylaugh:
Intel released the fixed Spectre microcode on 02/20/18.  It looks like most of the hardware vendors have released it as BIOS upgrades already, some the same week Intel released it.  Here we are n... See more...
Intel released the fixed Spectre microcode on 02/20/18.  It looks like most of the hardware vendors have released it as BIOS upgrades already, some the same week Intel released it.  Here we are now exactly a month later and still nothing from VMware on when they will be re-releasing their patch.  What's the ETA on this?
I can confirm this is still an issue on the 6.0U3b installation also as I just ran into it.
Did you see this one? ESXi host fails with intermittent NMI purple diagnostic screen on HP Gen8 servers (2085921) | VMware KB
Are both the source and destination host on the same subnet?  If so, are the subnet masks set right for both?  I just ran into a similar issue the other day.  In my case I had just added three ne... See more...
Are both the source and destination host on the same subnet?  If so, are the subnet masks set right for both?  I just ran into a similar issue the other day.  In my case I had just added three new hosts and had this issue vMotioing VMs from one host to any of the three new hosts.  It turned out the subnet mask on the source host was incorrectly set to a /26 instead of a /25 and the new hosts were just outside the /26 so it was trying to route the vMotion traffic out the VLAN interface and presumably back in the same interface. Also, are jumbo frames in play here?  If so, make sure it's set on all vmkernels, vswitches, etc.
A separate PSC isn't "required" but it would be desirable.  It sounds like you currently have an embedded PSC in your main site.  Since both are in the same SSO domain, then it's doubtful you cur... See more...
A separate PSC isn't "required" but it would be desirable.  It sounds like you currently have an embedded PSC in your main site.  Since both are in the same SSO domain, then it's doubtful you currently have two embedded PSCs as two embedded PSCs can't sync with one another.  Which means your DR site may be using the main site's PSC.  If that's the case, if you lose the main site, you may lose your ability to log in to your DR vCenter which kind of defeats the purpose.  While you don't have to, you should look into migrating your embedded PSC at the main site to an external PSC and sync it with the embedded one.  Then change the main site's vCenter to use that new external PSC.  Then install a new PSC at the DR site and sync it with the main site's PSC.  Lastly set the DR vCenter to use the DR site's new PSC.  That way if you lose either site, both will still be able to log in to the local vCenter using the local PSC. I should add that you may not need to migrate the embedded PSC at the main site to an external one first.  As long as you don't install the one at the DR site as an embedded one also, they should be able to sync.  But ideally your environment should have an external PSC on both sides.
Sorry for the late reply.  I got the beta driver from Intel.  But the 1.6.6 versioning was some sort of internal versioning.  It was officially released as 1.4.28 and can be found here: VMware... See more...
Sorry for the late reply.  I got the beta driver from Intel.  But the 1.6.6 versioning was some sort of internal versioning.  It was officially released as 1.4.28 and can be found here: VMware Compatibility Guide - I/O Device Search
I have a P2000 G3 connected to an older DL580 G7 and it's running 6.0U1b without any issues.
v1.6.6 was an internal beta version apparently.  It was released publicly as 1.4.28.  Get it at VMware Compatibility Guide - I/O Device Search.  The i40e-based NICs were somewhat unstable pri... See more...
v1.6.6 was an internal beta version apparently.  It was released publicly as 1.4.28.  Get it at VMware Compatibility Guide - I/O Device Search.  The i40e-based NICs were somewhat unstable prior to driver v1.4.28 and firmware v5.02.  Make sure you're on at least those versions at a minimum.  Dell should have the 5.04 firmware for the X710s on their site by now too.
What NIC are you using and what driver/firmware?  Run ethtool -i and you should see something like: driver: i40e version: 1.4.26 firmware-version: 5.02 0x8000222e 17.5.10 bus-info: 0000:01:... See more...
What NIC are you using and what driver/firmware?  Run ethtool -i and you should see something like: driver: i40e version: 1.4.26 firmware-version: 5.02 0x8000222e 17.5.10 bus-info: 0000:01:00.0
I have now pinpointed the exact bug in vCenter that causes this issue and it appears to affect every install of 6.0b through 6.0U2 (I didn't test 6.0 RTM or 6.0a).  On v5.5U2 vCenter databases, t... See more...
I have now pinpointed the exact bug in vCenter that causes this issue and it appears to affect every install of 6.0b through 6.0U2 (I didn't test 6.0 RTM or 6.0a).  On v5.5U2 vCenter databases, the VPX_STAT_COUNTER table has the following indexes: IX_VPX_SC_ENTITY_ID VPX_STAT_COUNTER_M1 VPX_STAT_COUNTER_M2 VPX_STAT_COUNTER_P1 VPX_STAT_COUNTER_U1 After upgrading to 6.0b through 6.0U2, or even on a brand-new 6.0U2 install (i.e. no upgrade), the VPX_STAT_COUNTER table has only the following two indexes: PK_VPX_STAT_COUNTER VPX_STAT_COUNTER_P1 The problem is that the stats_rollup2_proc stored procedure (the SQL Agent task "Past Week stats rollup" calls l_stats_rollup2_proc which in turn calls stats_rollup2_proc) contains a reference to one of these missing indexes: SET @sqlCommand_rt3 = 'INSERT INTO ' + ... + ' VPX_STAT_COUNTER SC  WITH(INDEX(VPX_STAT_COUNTER_M1) ... This causes the execution of the SELECT statement within that INSERT to return: Msg 308, Level 16, State 1, Line 1 Index 'VPX_STAT_COUNTER_M1' on table 'VPX_STAT_COUNTER' (specified in the FROM clause) does not exist. The effect of this is that the SELECT statement within @sqlCommand_rt3's INSERT statement, which is supposed to read VPX_STAT_DEF.ROLLUP_TYPE = 3 counters from HIST2 tables, returns 0 rows due to the above error, meaning VPX_STAT_DEF.ROLLUP_TYPE = 3" counters from HIST2 tables are never rolled up into HIST3 tables because @sqlCommand_rt3 isn't SELECTing any data.  To make matters worse, the error handling in this stored procedure is not catching this error, or at least not reporting it up to the SQL Agent so history shows this task as running successfully when it clearly is not. The answer that VMware support seems to be implying is that the VPX_STAT_COUNTER indexes are erroneously missing from 6.0 installs and should be there.  Either that or those indexes were removed on purpose and someone forgot to update the stored procedures to use one of the remaining indexes (my testing indicates that changing the stored procedure to use VPX_STAT_COUNTER_P1 instead may also fix the issue).  Either way, based on my testing, this affects every 6.0 vCenter install, upgrade or new install doesn't matter.  Most people probably don't notice the issue as most .latest counters don't seem to be exposed through the GUIs.  Unless you are specifically looking for .latest counters in interval 7200 using PowerCLI, or have a third party program that looks for them as was my case, you probably don't even notice you're missing this performance data from the 7200 interval and beyond. The fix is simple: recreate the missing indexes on the VPX_STAT_COUNTER table and the .latest counters then roll up to 7200/HIST3 properly again the next time the task executes.  By default the Weekly and Monthly tasks run at 2:15a so you need to wait until day 2 to see the .latest counters appear for the 86400/HIST4 interval.
Updating this in case anyone else finds it as I now know this is a much wider issue than just us.  The vendor of the third party monitoring software has told us that they are now seeing this with... See more...
Updating this in case anyone else finds it as I now know this is a much wider issue than just us.  The vendor of the third party monitoring software has told us that they are now seeing this with other clients' 6.0 vCenters so it's not unique to our vCenter install.  I can also now reproduce the problem by restoring our pre-upgrade 5.5U2 database (where all .latest counters are present in all 4 intervals) to a new vCenter install and then upgrading it to 6.0b.  After the upgrade, I am missing the .latest counters from both the 7200 and 86400 intervals, which is worse than in my production system where I am only missing the .latest counters from the 7200 interval.  So I think it's more than obvious that this is some sort of bug in the 6.0b install.  Not sure if the same issue will occur upgrading from 5.5U2 to any other 6.0 version yet as I haven't tested anything other than 6.0b yet.  I am trying to get this escalated within VMware support to determine the actual cause and fix instead of reinstalling a new vCenter as support advised.  Like I am really going to nuke my entire vCenter because 4 performance counters are missing /rolleyes.
It appears the article posted by Bleeder above, KB2144799, now indicates this was fixed in ESXi600-201608401-BG and ESXi550-201608401-BG released 8/4/16.  I haven't installed any of the patches r... See more...
It appears the article posted by Bleeder above, KB2144799, now indicates this was fixed in ESXi600-201608401-BG and ESXi550-201608401-BG released 8/4/16.  I haven't installed any of the patches released from 8/4 yet so I can't confirm.  While the issue did seem to be less frequent after dropping our stats collection levels as someone above indicated, it didn't completely fix the issue as I just witnessed this problem again the other day, but I also never increased the memory above the default as previous workarounds mentioned.
v1.4.28 has been released and supposedly contains the fix in the beta 1.6.6 driver I am using.  You can find it here:  VMware Compatibility Guide - I/O Device Search.  I have not installed 1.4.28... See more...
v1.4.28 has been released and supposedly contains the fix in the beta 1.6.6 driver I am using.  You can find it here:  VMware Compatibility Guide - I/O Device Search.  I have not installed 1.4.28 yet but the 1.6.6 beta driver seemed to clear up a lot of my issues.  My recommendation is to make sure you're on at least firmware 5.02 or better yet 5.04.
v1.4.28 of the i40e driver is posted at VMware Compatibility Guide - I/O Device Search.  It supposedly contains the fixes I mentioned above.  The beta version 1.6.6 must be an internal number or ... See more...
v1.4.28 of the i40e driver is posted at VMware Compatibility Guide - I/O Device Search.  It supposedly contains the fixes I mentioned above.  The beta version 1.6.6 must be an internal number or something.
You say you have QLE3242 NICs in your Dell R620s.  These are QLogic cards so I assume they are not using the i40e driver?  The i40e driver is for the newish line of Intel NICs, generally the X710... See more...
You say you have QLE3242 NICs in your Dell R620s.  These are QLogic cards so I assume they are not using the i40e driver?  The i40e driver is for the newish line of Intel NICs, generally the X710 family of NICs, which I believe is supplanting the X5x0-series (ixgbe driver) of Intel NICs.  If they don't use the i40e driver, you won't have this particular problem.  That's not to say whatever driver the QLogic cards use won't have it's own issues but you shouldn't see this particular issue. We've had a Dell R630, since May/June 2015, with an X710 (DA4 I believe offhand), where two of the NICs are for iSCSI and the other two for normal network traffic.  The NICs were problematic from day one.  Aside from the issues I mentioned above, which are well documented by now and should fix your immediate issue, there are still outstanding issues with that NIC that causes TX/transmit failures which cause the NIC to stop transmitting, and can essentially crash the host but the VMs won't HA until you bounce the affected host.  I had to throw a fit with Dell to get them to address these issues with Intel.  Dell has a beta 1.66 driver for the i40e that seems to have fixed those issues (or I just haven't had a recurrence yet), along with an upgrade to firmware 5.02, so not sure if it's the driver itself or the firmware.  The driver's not public yet, no timeframe on when it goes public.  If you run into those same issues ("TX driver issue detected, PF reset issued" in the vmkernel.log), open a Dell case and have it escalated as the front line techs probably won't know about this and will have you run all sorts of unrelated time-filling tasks.  Be adamant about escalation.  There are people there that know about this.
So this issue was brought to us by a third party company that uses the performance data to analyze overall usage of VMware environments including SANs, networking, etc.  They are now stating that... See more...
So this issue was brought to us by a third party company that uses the performance data to analyze overall usage of VMware environments including SANs, networking, etc.  They are now stating that they are seeing this problem with our customers and appears to be related to vCenter 5.5 upgrades to 6.0.  I am 99.9999% confident this is a bug at this point.  I have a case opened with VMware but getting them to actually test the upgrade scenarios needed to determine how and when this occurs is like pulling teeth.