MattMeyer's Posts

I believe this is using consumed memory + memory overhead. That is the same metric that is used for the graph in the first screenshot as you already noted. The number of available resources is ca... See more...
I believe this is using consumed memory + memory overhead. That is the same metric that is used for the graph in the first screenshot as you already noted. The number of available resources is calculated by removing N hosts set in the Admission Control setting. If the sum of consumed memory exceeds cluster resources after N failures are removed, you get the warning. This warning is not controlled by reservations.
I was just having the same problem in my lab. The PSC/VCSA were also upgraded from 6.5 to 6.7. To get it working I SSH'd into the VCSA and changed the password using 'passwd' to something simple.... See more...
I was just having the same problem in my lab. The PSC/VCSA were also upgraded from 6.5 to 6.7. To get it working I SSH'd into the VCSA and changed the password using 'passwd' to something simple. I chose "VMware1!". Then I head over to the Appliance Management Interface on port 5480 and I was able to get right in using the root/VMware1!. I then logged out, and head back to the SSH session and changed the password back to the original strong password. Then back to the VCSA:5480, and wouldn't you know it? The original password worked this time.
Sorry I'm late to the party. Setting "127.0.0.1" as the syslog host will suppress the warning.
It's not possible with some kind of shared storage. The standby host needs to read the vm data to power the VM back on. VSAN is an option too.
Actually, the entire technology for this feature changed, not just the behavior. There were a few reasons for this. 1. The SMP VM was always the top request for customers looking to use this f... See more...
Actually, the entire technology for this feature changed, not just the behavior. There were a few reasons for this. 1. The SMP VM was always the top request for customers looking to use this feature. The old technology (record/replay) was not able to monitor multiple threads and replay them on the secondary machine without significant performance issues. One thread is easy since there is nothing to keep in sync. With SMP, threads need to be replayed back in the exact same order on the secondary host as the primary. Now, you are not just replaying the execution of the thread since 1 vCPU is really FIFO, but with SMP not so much. So now FT acts more like an xVmotion (vMotion + Storage vMotion) that never completes. The vMotion part will monitor memory bitmap. As memory changes, the changes are sent to the secondary VM world running on the second host. The difference from a real vMotion is there are many, many, many more checkpoints involved. A checkpoint is a window of time that is checked for changes to the memory bitmap. A regular vMotion allows for the check point to be really big at first, then as you get towards the old of the vMotion, more checkpoints are added to move less data between the each checkpoint. Once it's determined the rest of the dirty memory must be able to be transferred under 1 second. If not, it continues to try again until there is a moment it can do it. FT does this much the same way but on steroids. We are talking sub ms checkpoints here because we don't have the luxury of a 1-second window like a vMotion can tolerate. There are other several low-level geek stuff that differs from a regular vMotion too, but that's the general way memory is not synced. Next is way CPU is handled. With Legacy FT, aka RR FT (record/replay FT), everything was actually played back exactly like the original. With the new generation of FT the thread is executed on the primary host, but only the result is sent to the secondary. This is also how vMotion does things. When you think about it, it's not entirely necessary for the secondary VM to need to replay everything as long as it has the result from the last thing that was processed on the primary. If there is a failure, the secondary will pick up where the primary left off. Finally, storage. Since the VM is not replaying everything that happened on the primary VM, the secondary doesn't need to read the disks for any IO that is needed. So we do this different now too. Similar to a Storage vMotion, FT will send any write IOs to the secondary VM to be committed there before committed on the primary. Read IO really doesn't make a big difference since the secondary VM doesn't need to read anything again. There are benefits to this now because you can store a secondary copy of the data on a completely different set of disks, or even array for increased redundancy. The drawback is double the required storage. Some of the other benefits of the new technology used is the hosts don't need to have identical CPUs. With RR FT the CPUs needed to pretty much match since the entire thread was executed again. If there are instruction sets used on one CPU that are not available to the other, a whole lot of bad things happen. Since it's more like a vMotion, as long as the CPUs can participate in the same EVC mode, they can be used for FT too. 2. The other reason it was time to move on was the evolution of the silicon. There are certain features of the CPU that were not guaranteed to stick around in future generations. We needed a way to guarantee the feature without the being tied to certain features of the CPU, so the new FT was developed. Hope this helps explain how it all works, and why things changed.
The FT VM has to be managed by a single vCenter server, even in 6.5.
You are looking for vSphere HA, not Proactive HA. vSphere HA will restart VMs from failed hosts to healthy hosts. Proactive HA communicates with hardware vendor monitoring solution to detect fail... See more...
You are looking for vSphere HA, not Proactive HA. vSphere HA will restart VMs from failed hosts to healthy hosts. Proactive HA communicates with hardware vendor monitoring solution to detect failed or unhealthy components. Things like fans and power supplies. When those failures are detected, vSphere DRS will begin to move off VMs from the affected hosts before a component failure causes an outage.
Hi, That's not the purpose of the heartbeat datastore.  The HB datastore is intended for an host to update the datastore to inform the HA master that it's either partitioned or isolated.  If p... See more...
Hi, That's not the purpose of the heartbeat datastore.  The HB datastore is intended for an host to update the datastore to inform the HA master that it's either partitioned or isolated.  If partitioned, no HA restart will happen.  When isolated, the isolation response will kick in.  In your case, the VMs would still not restart, even when a host is determined to be isolated.  This is because vSphere 5.1 HA determines if a host is dead by monitoring the management network.  I am assuming that the host management network was available the entire time during the storage outage. If you want vSphere HA to recover from a host outage that is impacting only subset of hosts in a cluster, you will want to look into using VM Component Protection that is included with 6.0 and greater.  This offers the protection from a storage outage and will restart VMs on other hosts in the cluster.
I just tested this to see what would happen.  I was excepting the power on operations to fail, but they didn't.  The VMs did power on just fine.  It seems that vCD handles answering the confirmat... See more...
I just tested this to see what would happen.  I was excepting the power on operations to fail, but they didn't.  The VMs did power on just fine.  It seems that vCD handles answering the confirmations for the host DRS recommends the VM to be powered on.  Normally, the administrator would need to apply the recommendation, but it seems when vCD is in place, this is handled for you, and the VMs get powered on.  I learned something new.  This was tested using vCD 8.10.0. That said, I would never recommend this since DRS will never move a VM due to a host in the cluster being overloaded.  If vCD is used, fully automated mode is the best recommendation.
A datastore heartbeat and a VM Tools heartbeat are two completely different things. They do share the same name though, which I concede can be confusing. The datastore heartbeat in that imag... See more...
A datastore heartbeat and a VM Tools heartbeat are two completely different things. They do share the same name though, which I concede can be confusing. The datastore heartbeat in that image is simply a way for the HA node to determine if it's isolated from other hosts when the management network is down. If that is not checked, HA will not have that additional mechanism to determine the host state (partitioned, isolated, or failed) The VM Tools heartbeat is something different and when OS monitoring is enabled it looks for I/O on the actual VMDK for that VM. It has nothing to do with the datastore heartbeat selected.
The VM Tools heartbeat is the first check to determine if the monitored VM is unresponsive. After a heartbeat is not received, HA will look for I/O coming from that VM. We look for both network ... See more...
The VM Tools heartbeat is the first check to determine if the monitored VM is unresponsive. After a heartbeat is not received, HA will look for I/O coming from that VM. We look for both network and storage I/O. If I/O is detected, a restart will not occur. If no I/O is seen and the heartbeats are not detected, it a pretty high likelihood the OS within the VM has hung and HA will reset the VM.
This is similar to how the setup was tested for the performance whitepaper on the legacy version of FT that can be found here: https://www.vmware.com/files/pdf/perf-vsphere-fault_tolerance.pdf ... See more...
This is similar to how the setup was tested for the performance whitepaper on the legacy version of FT that can be found here: https://www.vmware.com/files/pdf/perf-vsphere-fault_tolerance.pdf I cannot think of a reason why it wouldn't be supported without a switch in the communication channel.  The configuration for FT on the host is the same.  You may have to come up with some creative ways to upgrade the hosts without a 3rd host.  You will also give up the automatic remediation of FT due to a host failure.  vSphere HA will normally restart a failed VM on another host and reconfigure FT sync for you.  Without another place to run the VM, you'll be left in a reduced redundancy state. TL;DR: Will work fine and be supported, but you'll lose some things.
Hi Bradley, Can you please point me where you saw that 10GbE was only recommended, and not required?  I want to make sure this gets corrected.  As you already mentioned, FT can be really "burs... See more...
Hi Bradley, Can you please point me where you saw that 10GbE was only recommended, and not required?  I want to make sure this gets corrected.  As you already mentioned, FT can be really "bursty", exceeding 1GbE speeds pretty quickly.  In some of my tests these spikes can even occur within fractions of a second, so you'll never capture them using conventional tools like ESXTOP.  When the FT network cannot keep up, it results in latency within the app. I double checked just to be sure, and the official documentation does state that a 10GbE dedicated network is required. vSphere 6.0 Documentation Center - Fault Tolerance
I fixed this by disconnecting the server from VC (not removing it, just a disconnect). Restart the VC service on the VC server. Manually uninstall the VC agents from the ESX host. Reconnect th... See more...
I fixed this by disconnecting the server from VC (not removing it, just a disconnect). Restart the VC service on the VC server. Manually uninstall the VC agents from the ESX host. Reconnect the ESX host to VC. It should ask for a user/pass and upload the agent.
I'm having the same issue as well. The entire cluster was upgraded from 3.5 to 4.0u1, and ever since the HA agent refuses to install.
Assuming the management machine DG is 192.168.0.1, can you ping it from the VSA? From what I can tell, the vmkernel DG and the COS gateway have no effect in this equation. There is a routing ... See more...
Assuming the management machine DG is 192.168.0.1, can you ping it from the VSA? From what I can tell, the vmkernel DG and the COS gateway have no effect in this equation. There is a routing problem from the VSA to an outside network. This point me to the network config in the VSA, or an external routing problem in the switch. I would start with double-checking the physical connections, then the port VLAN config on the switch. It seems like the switch is not routing between 192.168.2.0 <-> 192.168.0.0. If it is routing according to the config, then check the physical connections to make sure the cables are plugged into the right ports that operate in that correct VLAN.
I'm not famaliar enough with Lefthand products to give a definitive answer,but I'm pretty sure a VSA is nothing but a regular VM acting as a software iSCSI target. Is it possible there is a fire... See more...
I'm not famaliar enough with Lefthand products to give a definitive answer,but I'm pretty sure a VSA is nothing but a regular VM acting as a software iSCSI target. Is it possible there is a firewall service running on the VSA that allows ping from the local subnet only and blocks everything beyond?
I just read your original post. Are you trying to connect to the VSA from outside the 192.168.2.x subnet? If you are, it's not the vmkernel default gateway that needs set. It would be the VSA'... See more...
I just read your original post. Are you trying to connect to the VSA from outside the 192.168.2.x subnet? If you are, it's not the vmkernel default gateway that needs set. It would be the VSA's.
When you use "ping" it's using the service console connection. But if you "vmkping" it will use the vmkernel connection. From what I can tell though, the routing should not be any kind of a p... See more...
When you use "ping" it's using the service console connection. But if you "vmkping" it will use the vmkernel connection. From what I can tell though, the routing should not be any kind of a problem since you have a SC connection and a vmkernel connection on the local subnet as the VSA. What happens when you use "vmkping" to ping the VSA?
Can you please post the output of "esxcfg-vmknic -l" and "esxcfg-nics -l" ? Thanks.