tman24's Posts

I'm just putting together a new vSphere 6.7 cluster (latest build). This consists of 4 x ESXi hosts, each with 2 x 16C/32T CPU's @2.1Ghz. All the CPU's are detected ok, as well as hyperthreading, whi... See more...
I'm just putting together a new vSphere 6.7 cluster (latest build). This consists of 4 x ESXi hosts, each with 2 x 16C/32T CPU's @2.1Ghz. All the CPU's are detected ok, as well as hyperthreading, which is enabled and working. What I noticed is is that the Total CPU Capacity for each host is being calculated as ~67Ghz, when based on what I think it should be, which is 64 x 2.1Ghz, which should be ~134Ghz. It would seem that the capacity is being calculated on 32 x 2.1Ghz, ie, without HT. is this correct, or am I reading it wrong? When I Iogon to each ESXi host, it also shows 67Ghz as to total CPU capacity.
Thanks. Like I said, I have seen this problem before, but not on this cluster, and it's only since I applied the latest patches. To have two hosts within 2-3 weeks both giving the same error was ... See more...
Thanks. Like I said, I have seen this problem before, but not on this cluster, and it's only since I applied the latest patches. To have two hosts within 2-3 weeks both giving the same error was more than a co-incidence! You're right, I do need to upgrade, and will be soon, but I'll be running 6.0 for another few weeks at least.
I recently performed a round of upgrades on our main ESXi 6 cluster. vCenter is now running 6.5.15259038 and the cluster hosts are running 6.0.15169789. All the patching went ok, and this is the ... See more...
I recently performed a round of upgrades on our main ESXi 6 cluster. vCenter is now running 6.5.15259038 and the cluster hosts are running 6.0.15169789. All the patching went ok, and this is the last version of 6.0 I plan to use before moving to vSphere 6.5. We're running Enterprise, with DRS enabled. Servers are all Dell PowerEdge R540's with latest firmware/BIOS etc. Two weeks after applying the latest ESXi patch, I noticed one of the hosts was showing with an alarm, and the message 'Unable to apply DRS resource settings on host. . This can significantly reduce the effectiveness of DRS.'. I have seen this once before, so planned to restart the host management agents. The console on this host was extremely slow to respond though, and in the end became totally unresponsive, as was SSH. The VM's were still responding, but within the hour they all went unresponsive, and the host became unresponsive in the cluster. I tried a proper shutdown from the console, but that just hung, and I had no choice but to hard boot the server. It came back up ok though, and VM operations resumed. Now, a week or so later, another host in the cluster is reporting the same error. VM's are still running, and the host is responding, but for how long I don't know. The previous build of ESXi we were running never had any DRS issues at all - it's only this latest patch that's causing it by the looks of it. I'm getting very concerned now that this is going to be a regular problem on this build. I've just disabled DRS in the cluster, which isn't ideal, but has cleared the host error for now. As I said, it only seems to be this latest build that causing the issues, although I did to a minor vCenter patch a couple of weeks before doing the hosts. Is this a known problem with recent builds? Should I just be looking to upgrade to 6.5 or 6.7 now? 6.0 has certainly done the business for us, and has been very reliable, but I'm now hesitant and a bit concerned. Thanks
Noticing this on a new ESXi 6.7 cluster connected to a flash array too using multipathed 1Gb iSCSI, but the latency spike only shows on the guest, and not on the host if you monitor the same time... See more...
Noticing this on a new ESXi 6.7 cluster connected to a flash array too using multipathed 1Gb iSCSI, but the latency spike only shows on the guest, and not on the host if you monitor the same timeframe. The guests don't appear to have a problem, but the UI certainly shows read and write latency stretching into many years!
Sorry to follow up on this. While I've now got the AMD Opteron Gen 3 without 3DNow EVC mode active, it's become pretty obvious very quickly that this mode doesn't expose the AVX/AVX2 CPU featu... See more...
Sorry to follow up on this. While I've now got the AMD Opteron Gen 3 without 3DNow EVC mode active, it's become pretty obvious very quickly that this mode doesn't expose the AVX/AVX2 CPU features to guests. AVX was introduced in AMD Opteron Gen 4, but that mode doesn't work with EPYC. I keep seeing mention of the 'AMD Zen Generation' EVC mode, and VMware's CPU compatability guide shows this as a feature of 6.5U1, but it's definately not there in my list in vCenter. There's little info out there on this, but I'm confused to why a brand new CPU has to use a very old EVC mode, and isn't even compatible with later Steamroller or Piledriver AMD CPU EVC modes. Just where is the  Zen Generation EVC mode!
Thanks for your help. The key was to move the vCenter Appliance off the host and with no other running VM's, I could then change the EVC mode the 'AMD Opteron Gen 3 without 3DNow!', the move the ... See more...
Thanks for your help. The key was to move the vCenter Appliance off the host and with no other running VM's, I could then change the EVC mode the 'AMD Opteron Gen 3 without 3DNow!', the move the vCenter appliance back. No other way I could see. I still don't know where the 'AMD Zen Generation' EVC mode is, unless it's not actually in the vSphere yet.. It seems strange that for a brand new CPU you have to use a pretty old EVC mode.
Thanks for the suggestion. I can confirm that if I try to set the EVC mode to anything other than 'Gen 3 without 3DNow!', I get the following error; The host's CPU hardware does not support th... See more...
Thanks for the suggestion. I can confirm that if I try to set the EVC mode to anything other than 'Gen 3 without 3DNow!', I get the following error; The host's CPU hardware does not support the cluster's current Enhanced vMotion Compatibility mode. The host CPU lacks features required by that mode. If I try to set the EVC mode to 'Gen 3 without 3DNow!', I actually get a different error; The host cannot be admitted to the cluster's current Enhanced vMotion Compatibility mode. Powered-on or suspended virtual machines on the host may be using CPU features hidden by that mode. The only VM I've got running on the host at the moment is the vCenter Appliance, and I can't shut that down, or I won't be able to change the EVC mode! Only way I can see is if I migrate the appliance off the host, then try setting the EVC mode again when the appliance is running somewhere else. Will that work?
I'm running the latest build of 6.5U1 on a dual socket EPYC 7351 system. ESXi installed fine, along with vCenter. Everything seems to be working ok, but when I try and enable EVC for AMD CPUs, Ev... See more...
I'm running the latest build of 6.5U1 on a dual socket EPYC 7351 system. ESXi installed fine, along with vCenter. Everything seems to be working ok, but when I try and enable EVC for AMD CPUs, Every option I select says the CPU isn't compatible, and based on VMware's documentation, there should also be a 'Zen' option of which on my system, there isn't. I've tried the 'Opteron Gen3 (without 3DNow)' that *should* also work, according to the docs, but that says the CPU is incompatible as well! Anyone know what gives? 6.5U1 is meant to be fully comaptible with EPYC, so I'm now sure what's going on. Thanks
Thanks, but there isn't any error. Almost always, I just don't visibly see any other host to migrate to other than the host the VM is currently running on when stepping through the 'migrate' wiza... See more...
Thanks, but there isn't any error. Almost always, I just don't visibly see any other host to migrate to other than the host the VM is currently running on when stepping through the 'migrate' wizard. As I have the vMotion license, I should see all hosts in the cluster.
I've just completed an upgrade from 3 x ESXi 5.0 hosts and a vCenter 5.1 appliance (Essentials Plus) to ESXi 6.0.5050593 and VCSA 6.5.5973321. This is a fully supported VMware setup. I upgraded t... See more...
I've just completed an upgrade from 3 x ESXi 5.0 hosts and a vCenter 5.1 appliance (Essentials Plus) to ESXi 6.0.5050593 and VCSA 6.5.5973321. This is a fully supported VMware setup. I upgraded the licenses and applied them to the new setup ok. One very strange observation is that when I try and vmotion a VM to another host, if I select 'Change compute resource only', invariably the only destination host I'll see is the host the VM is currently running on - no other cluster hosts will show. If though, I select 'Change both compute resource and storage', I can expand the cluster and see all hosts. This method of migrating the VM will work, but I have to 'fake' a datastore change selection. Things I've verified are; It happens in both the Flash and HTML UI's. Sometimes I will see all the migration hosts, but not all the time and very inconsistently The message pre-migration will always show 'Compatability checks succeeded'. I thought initially it was out of date VMware Tools, and it certainly looked more promising when tools was updated to 10.0.9 (build 10249), but then it all seemed to revert back and the same things happen. Is this a bug? I know I'm not running the latest ESXi version, or even the latest build of ESXi 6.0, but I'm not quite ready for that yet. It never happened in 5.x. Ideas? Thanks
We have a pair of high performance Dell R730 servers running 5.5 b3568722. These have Intel X540 10G NIC's connected to a SuperMicro based storage array, again with Intel 10G NIC's running Open-E... See more...
We have a pair of high performance Dell R730 servers running 5.5 b3568722. These have Intel X540 10G NIC's connected to a SuperMicro based storage array, again with Intel 10G NIC's running Open-E DSS v7 (a certified VMware storage appliance). The array has a brand new LSI 9361 RAID controller with 1GB cache providing an 8TB RAID5 array with a 250GB LSI CacheCade v2 SSD cache. On paper, this should provide >30,000 IOPS. Benchmark testing in Open-E DSS for the storage shows ~960MB/s READ, ~600MB/s WRITE. The Dell servers have a point-to-point connection to the storage using CAT6A x-over cables (so, no switch involved). This is a single (Fixed) 10G path, so no MPIO. Storage is presented to the ESXi hosts over iSCSI. I've optimzed the ESXi and DSS iSCSI settings to the best performing settings, so from that perspective, everything looks ok. I'm running with Jumbo frames and an MTU of 9000. Without Jumbo frames, write latency is upwards of 250ms which just kills performance totally. With Jumbo frames enabled, write latency is much better, but still peaks at around 30ms, which is still far from ideal. General performance is still shocking though - around 60MB/s write falling to 30MB/s on occasion when cloning from internal host storage to the external array. For a 10Gig link, this is appalling. Looking at disk stats in esxtop, KAVG/s is always 0, but DAVG/s (and thus GAVG/s) is hitting upwards of 30, which is the problem. As DAVG/s fluctuates lower, performance increases, but never really goes above 60-80MB/s. If I clone from the external array to internal ESXi storage, performance is what I'd expect for what's available. DAVG/s and GAVG/s both stay at or below 1, and write performance to the internal RAID1 mirror on the ESXi host peaks at around 200MB/s, so it's only writes on the external array causing the issue I can see. What I've tried; Upgraded the Intel NIC driver from the OOB version to the current 4.4.1-iov Setting both the ESXi host and DSS storage array to common identical initiator/target settings Tried the Delayed Ack off on the ESXi host (no change, so set it back on) Enabled LRO on the DSS array (the ESXi host reports LSO is not an available function) Nothing really changes or affects performance I can see. There's no large jump in performance. I know RAID5 isn't the best for write, but the raw benchmarks show what it should be capable of, and I'm only getting 10% of that.
Thanks for the info. On a 1Gb ethernet connection, you're probably right on the minimal 1-2% performance improvement, but with 10GbE, it can be much larger, and the load on the systems is much le... See more...
Thanks for the info. On a 1Gb ethernet connection, you're probably right on the minimal 1-2% performance improvement, but with 10GbE, it can be much larger, and the load on the systems is much less. I'd expect in the order of >20%  improvement with 10GbE. As I said, the settings on both hosts are identical, so it's mighty strange that I've had these problems with one of them. No errors in any of the config I can see.
So, I have two ESXi5.5 hosts connected to some iSCSI storage. Both hosts are identical in every respect, inc the dual-port Intel X540-T2 10GbE NIC in the same slots. The storage array has two dua... See more...
So, I have two ESXi5.5 hosts connected to some iSCSI storage. Both hosts are identical in every respect, inc the dual-port Intel X540-T2 10GbE NIC in the same slots. The storage array has two dual-port X540-T2 NICs. I'm connecting the hosts to the storage using CAT6A x-over cables. The hosts are using the standard ESXi software iSCSI adapter. There are no errors on either host. and the vCenter Appliance is happily talking to both hosts with HA/DRS enabled. I needed to performance tune the iSCSI connection to the storage, and along with some basic modifications to the iSCSI S/W adapter on each host to match the storage array, I wanted to switch to using 9000-byte jumbo frames. I made the jumbo frame setting changes to the storage, then made the changes to the ESXi hosts. The first host went perfect - a quick rescan of the HBA's, and the storage was working fine - much quicker as well. Then I made the IDENTICAL change to the second host, and it went into a complete fit. First, it completely lost connection to the storage, then I couldn't view any Network Adapter, Storage or Storage Adapter settings. The config pages wouldn't load, and I eventually lost connection from the VIClient. SSH still seemed to work, but no commands would be accepted. The console was also unresponsive, so I had to do a hard reboot. The host started to boot, but was VERY slow when scanning for iSCSI volumes. It did finish booting after about 10 minutes though. At this stage, I could get back into the config settings, but the iSCSI volumes weren't showing, but the iSCSI LUN's were showing as available in the host iSCSI HBA! I tried to force mount the volumes, and the host went into meltdown again (same symptoms). Another hard boot. After about an hour of fighting with this, I changed the MTU back to 1500 on the vSwitch connected to the storage array. Almost instantly, the storage re-appeared and re-mounted. Change MTU back to 9000, same fit. I triple checked everything - all settings looked ok. I switched the host storage connection to an unused 10GbE port on the storage array - same problem. I changed the x-over cable - same problem. Nothing I could do would make the host use a 9000-byte MTU, but the other host is happily using it! Beats me. Got to try and get this sorted before this system goes into production. Anyone got any ideas?
Yeah, thanks. I had already tried that, but I think there must have been something wrong with the unattend.xml file I was using. I've now had another go, and you're right, it does seem to do the ... See more...
Yeah, thanks. I had already tried that, but I think there must have been something wrong with the unattend.xml file I was using. I've now had another go, and you're right, it does seem to do the trick, so this should do what I need.
To cut a long story short, when creating Windows 7 templates, is there any way of permanently assigning a guest customization policy to a template? If I manually step through deploying a VM from ... See more...
To cut a long story short, when creating Windows 7 templates, is there any way of permanently assigning a guest customization policy to a template? If I manually step through deploying a VM from a template, I can assign a guest customization, but it *has* to be done manually. I want to be able to deploy a VM from a template and have it always use a guest customization automatically. This is on vCenter/ESXi 6.0. Don't mind if it's via the vi or web clients. I can set the virtual hardware level to whatever's required.
Well, I think I've cracked it. In my case, it looks like disabling / re-enabling HA in the cluster sorted it. This forced a full cluster re-election, and the failed host was then let back in. Sur... See more...
Well, I think I've cracked it. In my case, it looks like disabling / re-enabling HA in the cluster sorted it. This forced a full cluster re-election, and the failed host was then let back in. Surprisingly, the same host that was master before, became the master again, but all the 'is bad ip' messages disappeared from the fdm.log file on the failed host. I'm going to get it sit like this over the weekend before migrating any VM's back onto the host next week. Thanks for your help.
That's a very strange 'fix'. I was going to try turning off HA in the cluster, then re-enabling to see if that helpded. You suggest it won't, but I think I'll have to give it a try. I'll report b... See more...
That's a very strange 'fix'. I was going to try turning off HA in the cluster, then re-enabling to see if that helpded. You suggest it won't, but I think I'll have to give it a try. I'll report back, but your 'solution' is one I might have to try if nothing else helps.
Thanks for the reply. MTU is definately set to 1500 on all management interfaces. Never used Jumbo frames on our vSphere hosts. I can also confirm that the second IP used for NFS definately I... See more...
Thanks for the reply. MTU is definately set to 1500 on all management interfaces. Never used Jumbo frames on our vSphere hosts. I can also confirm that the second IP used for NFS definately ISN'T configured for management traffic. I can also confirm that; Remove/Re-add of the ESXi host doesn't fix it Disconnect/Connect doesn't fix it.
We have a 4 node vSphere 5.0 cluster, and I've just been finishing off replacing two of the older nodes with brand new (identical) servers. The first host came across fine, and joined the cluster... See more...
We have a 4 node vSphere 5.0 cluster, and I've just been finishing off replacing two of the older nodes with brand new (identical) servers. The first host came across fine, and joined the cluster without a problem. The second one will join, but HA will not initialize correctly. After a period of time, I just get 'vSphere HA Agent Unreachable' next to the new host. I've dug around, and have found quite a few references to this problem, but nothing seems to fix it. I'm reasonably sure that the very first time I added the new host, everything looked ok in vCenter. I then proceeded to up the EVC level (part of the plan), and although I can't say it happened exectly when I did this, on or around the same time, I started getting the error. What I've tried. 1. Full reboot of the new host (no change) 2. Checking all management IP's on the hosts (all ok) 3. ICMP check between all cluster hosts and vCenter (all ok) 3. Checked DNS (all ok) 4. Checked: 'vCenter requires verfied SSL host certificates' (all ok) 5. Reverting EVC level back to the original setting (did not fix it) I've had a look at vpxa.log, and there doesn't seem to be any problems I can see there, but fdm.log is reporting the following; 2014-11-14T14:51:08.306Z [FFD64B90 verbose 'Cluster' opID=SWI-e6ab007a] [ClusterManagerImpl::IsBadIP] 192.168.xxx.122 is bad ip 2014-11-14T14:51:08.306Z [FFD64B90 verbose 'Cluster' opID=SWI-e6ab007a] [ClusterManagerImpl::IsBadIP] 192.168.xxx.113 is bad ip 2014-11-14T14:51:08.359Z [FFD64B90 verbose 'Cluster' opID=SWI-e6ab007a] [ClusterManagerImpl::IsBadIP] 192.168.xxx.116 is bad ip In this case, xxx.113 is the management IP of another host in the cluster. xxx.116 is the management IP of the cluster master and xxx.122 is another IP address bound to one of the other hosts, just used for mounting NFS volumes (not management traffic). It just repeats these thee addresses over and over as 'is bad ip' in fdm.log. There's nothing wrong with these addresses. They're not duplicates, and are all in DNS correctly. So, while I can pretty much see the errors causing this, I don't know how to fix this. Could really do with some advice.
Nothing I could do in the end. I had to blitz the entire repository and start again. There's very little information on this particular error online, and what is available from VMware is ridiculo... See more...
Nothing I could do in the end. I had to blitz the entire repository and start again. There's very little information on this particular error online, and what is available from VMware is ridiculously high level and doesn't even begin to actually tackle the problem. If anyone else get's this problem, then good luck.