Preparing a migration + expansion, here's the list of what I have to work with:
The setup is going to run several fairly high load websites on IIS + MS SQL (both production and development), Exchange 2007 for about a hundred users, and SQL back-end for several internal applications. It will reside in a colocation facility, all users are remote, accessing it either via public internet or site-to-site VPN links. IIS runs on several load-balanced servers and uses NetApp CIFS shares for shared storage.
Current plan is:
Possible failures that I'm accounting for:
There is no budget for a second site or a tape library with offsite tape storage to protect against site failures; this is a known risk that the management is aware of.
This is based on a recommendation from Duncan Epping's HA section in his HA/DRS Technical Deepdive book. And also here:
http://www.yellow-bricks.com/2011/03/22/esxi-management-network-resiliency/
The first glaring thing I see is having your management network on the same vSwitch as your VM traffic. It is best practice to segregate them. I will throw some other suggestions as I have a little more time. But through the quick glance, that is what I saw first.
It's segregated into a different VLAN - from what I've read, that should be sufficient.
Edit: Would it be better to put it on the same vSwitch with VMotion? Because I really would prefer to segregate it from storage traffic.
Hi
bmekler wrote:
It's segregated into a different VLAN - from what I've read, that should be sufficient.
Nope, it is not sufficient, you should separate mgmt traffic from VM traffic also on hardware level (different vmnics and vSwitch). Why ? cause over mgmt interface, nodes are exchanging heartbeats and if LAN will get suddenly high load over mgmt vmnics caused by VM's, some HB might got lost and server can get status isolated from LAN
Cheers
Artur
As Arthur stated, it is recommended to segregate the traffic physically as well. This gives the management network full reign to the network regardless of what your VM traffic is doing.
So, with 8 pNICs total, two of them allocated to VM traffic, two to VMotion, two to NFS/CIFS and two to VM iSCSI, where would you put the management network? Is it possible to put NFS VMkernel port, CIFS port group, iSCSI1 and iSCSI2 port groups on the same vSwitch with two pNICs, use route based on originating virtual port ID on the vSwitch, active/standby failover order on NFS and CIFS (vmnic2 active, vmnic6 standby), active/unused (vmnic2 active, vmnic6 unused) on iSCSI1, and the same in reverse (vmnic6 active, vmnic2 unused) on iSCSI2, or will STP detect a loop and shut down one of the ports?
Edit: Note that the VM network is the one that receives the absolute least traffic. This is in a remote facility, no local users, and the feed is 20mbps with 100mbps burst capability. Barring very abnormal circumstances, VM network traffic can't even approach saturating a gigabit link.
Nope, it is not sufficient, you should separate mgmt traffic from VM traffic also on hardware level (different vmnics and vSwitch).
What I wanna say here is that you should separate mgmt (vMotion and console) traffic from any other traffic not only VM. For mgmt you should create a separate vSwitch with two vmnic in a Active\Passive schema, e.g
vSwitch 1 ------- mgmt ---------vmnic0 - Active
--------- vmnic1 - Passive
------ vMotion --------- vmnic0 - Passive
--------- vmnic1 - Active
vmnic0 - connected to pSwitch0
vmnic1 - connected to pSwitch1
Of course vLAN trunking for both network
Gotta run home now if you want we can do some drawing later on 🙂
BTW, you should replace DUAL port NIC to QUADPort NIC, in your scenario 6 NIC is a minimum, I think.
Cheers
Artur
Artur wrote:
vSwitch 1 ------- mgmt ---------vmnic0 - Active
--------- vmnic1 - Passive
------ vMotion --------- vmnic0 - Passive--------- vmnic1 - Active
vmnic0 - connected to pSwitch0
vmnic1 - connected to pSwitch1
So you suggest putting management together with VMotion, not application traffic? I wanted to keep VMotion apart from everything else, seeing as how it's one thing that I have that is nearly guaranteed to saturate a gigabit link whenever it's triggered, unlike application traffic which is mild.
Also, will that scheme work with non-stacked switches? I have a suspicion that STP will detect a loop there and disable either vmnic0 or vmnic1.
Artur wrote:
BTW, you should replace DUAL port NIC to QUADPort NIC, in your scenario 6 NIC is a minimum, I think.
The R710 servers running vSphere do have 8 NICs; 2950 servers with four NICs are running physical Windows, not vSphere.
I have had this setup in deployments I have done with no issue at all.
Can I do the same thing with VM port groups? Say, set LAN on vmnic0 active/vmnic1 standby and DMZ1 on vmnic1 active/vmnic0 standby? What should the vSwitch be set to? Route based on the originating virtual port ID?
No, the portgroups will utilize the pNIC team. And yes, you would need to set the policy to Route based on originating virtual port ID.
Right, so if I configure as follows:
Switch 1, ports 1 and 24 set to VLAN5 and VLAN8 tagged
Switch 2, ports 1 and 24 set to VLAN5 and VLAN8 tagged
Cable goes between ports 24 on two switches
ESX1, vSwitch0 is set to use vmnic0 and vmnic1, all settings default - both adapters active, route based on originating virtual port ID, failure detection link status only, notify switches yes, failback yes, vmnic0 plugged into switch 1 port 1, vmnic2 plugged into switch 2 port 1
VM port group DMZ1, created on vSwitch0, tag 5, override vSwitch failover order selected, vmnic0 active, vmnic1 standby
VM port group LAN, created on vSwitch0, tag 8, override vSwitch failover order selected, vmnic1 active, vmnic0 standby
All traffic on DMZ1 will go through vmnic0 to switch 1, and all traffic on LAN will go through vmnic1 to switch 2, right? And if one of the switches goes down, the affected network will failover to the surviving switch?
You are correct but you will want to set failback to no for the vmnics.
James Bowling wrote:
you will want to set failback to no for the vmnics.
Why? If I understand it correctly, if I set failback to no, then after the first time there is a failover, traffic segregation will be gone forever (or until host reboot, I guess). Suppose switch 2 drops, then vmnic1 goes down, LAN traffic goes to vmnic0 sharing it with DMZ1. Switch 2 comes back, but LAN traffic stays on vmnic0. If switch 1 goes down, then both LAN and DMZ1 go to vmnic1 on switch 2.
Also, I always thought that this kind of configuration will cause spanning tree to detect a loop (three switches, each plugged into the other two) and shut down one of the links. Quite surprised to find out otherwise. I'll have to test the hell out of it before putting it into production.
This is based on a recommendation from Duncan Epping's HA section in his HA/DRS Technical Deepdive book. And also here:
http://www.yellow-bricks.com/2011/03/22/esxi-management-network-resiliency/
The article you linked mentions VLAN trunking. Unless I'm reading it wrong, this means that the switches are stacked, a trunk is configured across the two switches, and the two vmnics plug into that trunk. This is not possible in my setup, as ProCurve E2510 switches are not stackable.
VLAN Trunking has nothing to do with stacked switches. It is simply a way to allow a group of ports or a single port to route multiple VLANs over a common interface.
I see. So, revised plan at vSphere host level:
This way, the load is spread to various degrees across all eight NICs in the host, with every connection being redundant.
Hi
- VMkernel port VMotion, tagged 4, use explicit failover order, vmnic1 active, vmnic5 standby
- VMkernel port Management, tagged 9, use explicit failover order, vmnic5 active, vmnic1 standby
- Port group Management, tagged 9, use explicit failover order, vmnic5 active, vmnic1 standby
Looks good, just a one question - Port Group management - what type of traffic will be used for ?
It is good also to extend isloation time das.failuredetectiontime to 30000 ms (advance HA settings) and add (if possible) at least one more IP das.isolationaddress1 = <IP address>, in case if default GW become non-pingable then HA will use second IP to verify host network isolation
Artur wrote:
Looks good, just a one question - Port Group management - what type of traffic will be used for ?
Some applications running in VMs need to access the management network - IPSentry for OS monitoring, MRTG to keep an eye on network load, Dell IT Assistant to process SNMP traps from hardware, that sort of thing.
Artur wrote:
It is good also to extend isloation time das.failuredetectiontime to 30000 ms (advance HA settings) and add (if possible) at least one more IP das.isolationaddress1 = <IP address>, in case if default GW become non-pingable then HA will use second IP to verify host network isolation
Good suggestion, thanks. My default gateway is a clustered pair of Fortigate-200Bs which theoretically should be always available, but better safe than sorry. I'll use my storage (NetApp FAS2040) as the second isolation address.