Skip navigation
2019

Many times I heard my students ask what is the VCHA really and what is different between this feature and vSphere HA?

vSphere HA is a cluster-level feature that can be enabled to increase total availability of VMs inside the cluster and works whenever an ESXi host has been crashed, then HA will move VMs of that failed host to another available resources inside the cluster and reboot them in the new hosts. HA interacts directly to the ESXi HA Agent and will monitor status of each host of a cluster by investigate their heartbeats, So if an network segmentation / partitioning/ downtime is happened and also ESXi cannot provide its heartbeat to the shared datastore, HA will consider the host is failed and execute VM migration operation.

But vCenter HA is a new feature published after release of vSphere6.5 and directly related to the vCenter Server Appliance. It will create a cluster-state of VCSA VM in a triple-node structure: Active node (Primary vCenter server), Passive (Secondary vCenter acting after disaster) and Witness (act as a quorum). It's just about the VCSA availability factor only. vCenter HA is a new vCenter feature that is enabled only for VCSA (Because of PostgreSQL native replication mechanism), also can provide more availability for this mission-critical service inside the virtualization infrastructure.

As VMware said whenever VCHA is enabled, in case of vCenter failure, operation will be revived after 2~4 minutes depends on vCenter config and inventory size. Also VCHA activation process can be done less than 10 minutes.

Now I want to compare these two feature with respect to each related concept of IT infrastructure:

1. Network Complexity:

vCenter HA configuration needs a dedicated network to work and is totally separated from vCenter management network, Then to run VCHA cluster successfully it's required to have only three static IP or dedicated FQDN for assigning to each of cluster node. (I always prefer to choose a /29 subnet for them) After Active node failure, Passive will be automatically handle the vCenter management traffic and users just need to re-login their connections to the vCenter (VPXD through API or Web Client).

But a good vSphere HA operation is highly depends on cluster settings, so you don't need to do more network configuration especially for HA operation. (Just maybe in some situations you may need to separate host management and vMotion port groups based on network throughput)

2. Network Isolation:

In situation where there is a partitioning between hosts of a cluster, if a host cannot send any heartbeat to the shared datastore, it will be considered as a failed host. So HA tries to migrate and reboot all running VMs of that Host to another healthy hosts. I want to emphasis respect to availability of VMs belong to the host cluster there is two mechanisms of checking failures: network connections (between hosts and vCenter) and storage communication (inside the SAN area).

But if there is a network segmentation between vCenter HA nodes, we must care about what's really going on? I mean separation is happened between which nodes of the cluster? If Active-Passive or even Active-Witness nodes are  connected no need to worry, because the active node is still responsible of VI management operation. But what happened if active node is the isolated node?! Operationally it will get out of the VCHA cluster and stop to servicing, now the passive node will continue its job.

3. Multiple failures:

In the case of consecutive failures, if there is enough resources (RAM & CPU) inside the cluster, it can handle this problem, because vSphere HA will migrate VMs more and more to another available ESXi hosts. Just remember you must check out the Admission Control Policy settings respect to handle multiple ESXi failure.

But in vCenter HA, you should know about VCHA is not designed for multiple failures, So after the second failure, the VCHA cluster is not available and functional anymore.

4. Utilization, Performance and Overhead:

There is a little overhead for primary vCenter when VCHA is enabled, especially every time there is too many tasks to do for vCenter Server.

Witness needs the lowest CPU, because there is only VCHA service. Also it's almostly same for Passive node just for VCHA and PostgreSQL. There is no concern for memory usage.

But if you want HA works in its best mode you must pay attention to remaining resources in the cluster because bad HA configuration can make the cluster unstable, So for best performance in whole cluster you need to calculate availability rate based on remained and used physical resource. Specifying at lease two dedicated failover ESXi hosts to encounter against failure can be a suitable HA config.

 

Source of content inside my personal blog: Undercity of Virtualization: vSphere HA vs vCenter HA

If you want to check your hardware information or more details about your servers to do some operation like changing or increasing physical resources, you need to power-off or reboot the server to check the POST Process information. However because of being operational, you may not be able to do that. So there is a good command to help you in this situation: smbiosDump.

For instance to check your CPU, memory, NIC and Power configurations:

  1. smbiosDump | grep –A 4 ‘Physical Memory Array’  #Total Slot and Max Size of Memory
  2. smbiosDump | grep –A 12 ‘Memory Device’           #Type and Size of each Slot
  3. smbiosDump | grep –A 12 ‘CPU’                             #Processors Detail, Voltage,Clock & Cache
  4. smbiosDump | grep –A 4 ‘NIC’                                #Network Adapters and iLO  Details (HP)
  5. smbiosDump | grep –A 3 ‘Power’                            #Power Supply and Part Number

 

Source of content inside my personal blog: Undercity of Virtualization: Check VMware ESXi Hardware information by smbiosDump

 

Although we want to to manage all of our deployed hosts inside a single subnet or VLAN, maybe in some situations there need to place many of hypervisor on other subnets / VLANs. So if there is a way for routing the vCenter traffic from it's gateway to them, there is no problem. Only the requirement traffics for initial management (incoming TCP 443 / both side TCP 902 / outgoing UDP 902) must be permitted within your gateway / router / firewall. But if it's not possible to do that because of some management or security considerations, so you can input all of the required routes inside the vCenter Server Shell. There is two ways to do that. One method is using "route add" command on shell access. For example:

# route add -net 10.10.10.0 netmask 255.255.255.0 gw 10.10.100.1 dev eth0 

Result of this method is not persistent and will be clean after VCSA restart, Then it's useful only for testing or temporary situations. But if you want to save it, the Second way is editing of file *.network (such as 10-eth0.network) in and path "/etc/systemd/network" add intended routes in this form:

[Routes]

Destination=10.10.20.0/24

Gateway=10.10.100.2

 

Remember to add each route line in separated [Routes] brackets, otherwise it's not working as you expected. Then restart the network interface:

# ifdown eth0 | ifup eth0

 

or restart the networkd with these commands:

# systemctl restart systemd-networkd

# service network restart

 

And now if you want to check the results, run:

# route -n

# ip route show

 

Without shell access if you only login to VCSA console, there is many CLI for routing check and config, so you can use of these.

To check them and how to use:

 

> routes.list --help

> routes.add --help

> routes.delete --help

> routes.test --help

 

Note I: There is another file here: "/etc/sysconfig/network/routes", if you view it's content, it will show only the system default gateway,

no more routes will be shown here.

Note II: If you want to add routing to your ESXi hosts, just do:

# esxcli network ip route ipv4 add -n 10.10.20.0/24 -g 10.10.100.2

 

Source of Content inside my personal blog: Undercity of Virtualization: Set Manual Routing for VCSA

In the second part of SDDC Design (based on VMware Validated Design Reference Architecture Guide) we will review this book about Data Center Fabric and Networking design:

One of the main mandatory considerations of Datacenter design is networking, consists aspect of communication infrastructure, routing and switching. So development of modern datacenter in era of virtualization, leads us to this two-tier Architecture of DC-Networking: Leaf and Spine

  1. Leaf switch also called ToR (Top of Rack) will lie inside racks; provide network access for servers and storages and each leaf node has identical VLAN with a unique/24 subnet.
  2. Spine switch as the aggregation layer will make connectivity of Leaf switches among the racks and also will provide redundancy on Leaf-to-Spine (L2S) links. There is no redundant link required between two Spine switches.

This topology creates Fabric L2-L3 transport services:

  1. L2 switched fabric consists of Leaf and Spine switches will act as a larger switch for massive virtual environments (High-Performance Computing and Private Clouds). One of the popular switching fabric products is Cisco Fabric Path that provides highly scalable L2 multipath networks without STP. You have freedom on design by permitting to spread different VLANs for Management, vMotion or FT logging makes a big opportunity, but as its disadvantages, the size of the fabric is limited and only supports single vendor fabric switching products.
  2. L3 switched fabric can mix different L3-capable switching product vendors and act as an uplink for P2P L2S by enabling dynamic routing protocols like OSPF, IS-IS and  iBGP.

As a critical approach of network virtualization, it’s very important to consider P2V Networking requirements: connectivity and uplinks. So IP-based physical fabrics must have below characteristics:  

  1. Simplicity of network design: Identical configuration (NTP, SNMP, AAA …) and central management scheming for all switches configuring (P or V).
  2. Scalability of Infrastructure: Highly depends on server and storage equipment and their generated traffic, total uplink port, link speed and network bandwidth.
  3. High Bandwidth (HBW): Racks are usually hosting different workloads, so total connections or ports maybe cause oversubscription (equal of Total BW / Aggregate Uplink BW) on Leaf (ToR) switches. On the other hand, total uplinks of a Rack (Leaf switch) to each Spine switch must be same because of the hotspot phenomenon avoidance.
  4. Fault-Tolerance Transport: Using more Spine switches can reduce failure impacts on fabric. Because of multipath structure of L2S connectivity, adding of Spine switch creates more low available capacity per each of them and switch failures affect on less network capacity. For maintenance operations on switch devices by changing routing metrics and will ensure traffics pass away from only available uplinks.

     

  5. Different Quality of Services (QoS): Based on SLA, each type of traffics (Management, Storage and Tenant) have different characteristics like: Throughput volume, Sensitivity of data and storing location. QoS values will be set by hypervisors and the physical fabric switching infrastructure must accept it as a reliable value. Network congestion can be handled by sequencing and prioritizing and there is no requirement for re-classification of Server-to-Leaf switch ports. VDS switches support both L2 QoS (Class of Service) and L3 QoS (DSCP Marking) and for VXLAN networking, QoS values are copied from internal packet header to the VXLAN-encapsulated header.

 

 

Source of Content inside my personal blog: https://virtualundercity.blogspot.com/2018/08/vmware-sddc-design-considerations-part.html

Undercity of Virtualization: VMware SDDC Design Considerations - PART Two: SDDC Fabric

VMware SATPs (Storage Array Type Plugins) that are provided by ESXi for every type of array, especially storage vendors listed on VMware HCL. Also you can see list of SATP Plugins by running this command: esxcli storage nmp satp list

As VMware said: "SATP beside PSP (Path Selection Plugin) are parts of NMP (Native Multipathing Plugin) together are responsible for array-specific operations. You can see list of SATP Plugins by running this command: esxcli storage nmp psp list :

The only vendor that offers a special PSP is Dell for EqualLogic iSCSI array series. (and also EMC PowerPath as an additional plugin) VMware mentioned: "The NMP associates a set of physical paths with a specific storage device/LUN and assigned a default PSP for each logical device based on the SATP associated with the physical paths for that device." So she understood SATP has been associated to the storage device physical paths and PSP has been considered for handling and determining which physical path for I/O requests issued to a storage device.

After establishing a SSH session to the host, we did below procedure step by step:

1. esxcli storage nmp device list

and then find your storage device and it's related SATP type and then copy it's naa.xxx identifier to use for next rule adding step:

2.esxcli storage nmp satp rule add -s SATP_TYPE -d naa.xxx -o enabled_ssd

Next step needs to run for reclaiming:

3. esxcli storage core claiming reclaim -d naa.xxx

And now if you want to check your added rule:

4. esxcli storage nmp satp rule list

Note: You can't see changes on your datastore, and needs to reboot the host and check it again.

 

Source of Content inside my personal blog: Undercity of Virtualization: Change datastore disk type that is not detected as a SSD disk

 

What really is a Snapshot? Let's break into the more details. A Virtual Machine Snapshot is a technology that is executed to save a specific state of that VM by purpose of preserving VM's data, power state of VM and also its virtual memory. You can generate many snapshots to keep different states of your VM and is required for VM Backup procedure and is a great ability in test / pilot scenarios. So you can revert to any snapshot state if you need by Snapshot Manager. Then should be remembered very change is this duration (from snapshot to recent moment) will be discarded.
  But what have been affected whenever we create a new snapshot and what are the Pros and Cons of this feature? In this post I want to describe more detail about virtual machine snapshot feature inside the vSphere environments…
As a detail view snapshot exactly is a replica copy of VMDK in a specific moment. So it can be used for recovering a system from a failure state. All the Backup solutions work with snapshot every time they start a VM backup task to provide a copy of that. So as I said before snapshot generation provide a copy from these component’s contents:
1.    VM Settings (Hardware settings or any changes to the VM itself)
2.    VMDK state (Data has been write inside the VM Guest OS)
3.    VMEM Content (Virtual Memory like clipboard or swap contents)

 

So must be careful of using revert action because it will return all of this objects to the snapshot state. During snapshot generation it will create a delta file with .vmdk extension (also called redo logs or delta links) that acts like a child for its parent .vmdk (main VMDK before snapshot creation). So guess OS cannot do write operation on parent vmdk anymore and after that any disk writes action will be happened into the delta/child disks. First of all, child disk will be created from its parent and then the other successive children snapshots will be created from latest delta.vmdk in this chain. As its name shows, delta means difference between current state of the VM disk and last snapshot creation moment. Now any change in VM (and its Guest OS) will be writing to this new VMDK (delta file) from this moment. So we can say they are important as their parent.
  But what is the exact content of snapshot files? And where data has been written after taking snapshots? Consider A as the primary VMDK of virtual machine when there is no snapshots. B series (B1, B2) are children of A and also C files (C1, C2) are descendant of B. If you are after C2 snapshot, content data after reverting to C1 state is included base VMDK  (or flat file) and previous delta files: A+B1+B2+C1. Flat.vmdk is raw data structure of base file but it’s not a separate file when you check it into the datastore.

 

Into the Virtual Machine File System (VMFS), delta disk act as a sparse disk and it’s required to know about how data will store in virtual disks. There is a mechanism called COW (copy-on-write) for optimization of storage spaces. It means there is nothing into the VMDK until data copy occurs. I will explain more about COW mechanism and sparse disks more deeply in another post.
Now when you create many snapshots and cause complexity in the parent/child relations between snapshots, you may need to execute consolidation to reduce this confusing situation. It will merge these redo logs/delta vmdk inside a single vmdk to avoid complex status of snapshot managing. If the child disks are large in size, the consolidation operation may take more time.
There is also some another files related to the snapshot operation:
VMSN: It is a container for memory contents of VM. As VMware said if the snapshot includes the memory option, the ESXi host writes the memory of the virtual machine to disk. The VM is stunned during memory is being written but sadly you cannot pre-calculate time duration of that because it’s dependent on many factors such as disk performance and size of memory.
Remember VMSN always will be generated even if you don’t select memory option in snapshot creation. But its size is much lesser in non-memory state. So VMSN size is an overhead for total space calculation of snapshot generation in the datastore.
VMSD: It is the snapshot database and is the primary source for snapshot manager usage and its contents are relation tree of snapshots. So snapshot.vmsd consists of current config and active state of virtual machine.

Source of Content inside my personal blog: Undercity of Virtualization: Virtual Machine Snapshot Details Investigation - Part 1

 

  If you want to check your hardware information or more details about your servers to do some operation like changing or increasing physical resources, you need to power-off or reboot the server to check the POST Process information. However because of being operational, you may not be able to do that. So there is a good command to help you in this situation: smbiosDump.

For instance to check your CPU, memory, NIC and Power configurations:

  1. smbiosDump | grep –A 4 ‘Physical Memory Array’  #Total Slot and Max Size of Memory
  2. smbiosDump | grep –A 12 ‘Memory Device’           #Type and Size of each Slot
  3. smbiosDump | grep –A 12 ‘CPU’                             #Processors Detail, Voltage,Clock & Cache
  4. smbiosDump | grep –A 4 ‘NIC’                                #Network Adapters and iLO  Details (HP)
  5. smbiosDump | grep –A 3 ‘Power’                            #Power Supply and Part Number

 

Source of Content inside my personal blog: Undercity of Virtualization: Check VMware ESXi Hardware Details in Shell Access CLI Environment

In the first part of SDDC Design (based on VMware Validated Design Reference Architecture Guide) i want to speak more about VMware Software-Design Data-Center (SDDC) architecture and discussing about requirements and considerations. It's sensible reason that you always should regard capacity planning, scalability approach, extensibility potential and disaster recovery plan. Also there must be a design draft that will answer to business needs with intelligent and predictable solutions. Traditionally we called that “Infrastructure as a Service (IaaS)” and finally SDDC has extended and varied usage of it. This structure included many layers and modules:   

1.      Physical Layer (Computing, Network and Storage) that is included servers and other resources for Tenant and Edge services.   

2.      Virtual Infrastructure Layer: Granting access and assigning control procedures on the physical layer (Hypervisors and SAN Storage) for provisioning and managing tenant virtual machines. Management tasks consist: Management of virtual infra, Cloud, SM, BC Solutions and Security areas will be performed on this layer.   

3.      Cloud Management Layer: All of service requests handling are leveraged by this layer and also SM, BC & Security components are related to the CM Layer.  

4.      Service Management (SM), regardless of IT Infrastructure type, has a key role in service provisioning and request responding. Also all of monitoring, log management and alerting operations belong to this layer.  

5.      Business Continuity (BC) considerations addition to Disaster Recovery (DR) plan act as a SLA guarantee to make sure your IT resources (Hardware/Virtual/Cloud) are always available and if each interruptions happen, there must be another way to make your IT environment online. Every Backup and Replication solutions belong to this section.  

6.      Security considerations will increase infrastructure consistency and it includes every tools and solutions to deal with most of Internal / External threats or attacks. On the other hand each modules and components belong to another layers, require some protective features. So this section is comprehensive part of SDDC design.

 

Source of Content inside my personal blog: https://virtualundercity.blogspot.com/2018/03/connect-manage-vcsa-database-postgreql.html Undercity of Virtualization: VMware SDDC Design Considerations - PART One: SDDC Layers