10 ESX 3 Servers
1 SAN with 4TB of storage
Each ESX Server has 3 dual port GIG nics and two Fiber cards for SAN.
I am only concerned with configuring networking for vm, console, and VMkernel. Fiber cards are already planned out (primary and failover back to SAN switches).
Let me give you a little more info on nics so you clearly understand the plan. Nic1 (onboard) has ports (0 and 1), Nic2 (PCI slot) has ports (2 and 3) and Nic3 (PCI slot) has ports (4 and 5).
port 0 and 1 for primary teamed virtual machine traffic switch
port 2 and 5 for failover teamed virtual machine traffic switch
port 3 for console
port 5 for VM Kernel
Anyway, let me know what you think.
Two areas I am grey on:
-Is one gig port enough for VM Kernel?
-Is not having failover for console ok? Is console traffic strictly traffic into web console interface and traffic from ESX box to Virtual Center?
Any input as usual is appreciated! VMWare rules!
As a general rule, I team ports across NICs and not both on one NIC. My first subnet would be NIC2 port 2 and NIC3 port4. A second submet would be NIC2 port3 and NIC3 port5.
Are you doing switch bonding at the switch on different cards or switches? This would give you another layer of redundacy.
Im still kinda new to this but hope it opens more discussion...
You make a good point. I will most likely take that suggestion and work it into my design. I am going to call VMware support and ask about the VMKernel traffic and also about the console traffic. I'll update this post.
Take a look at your performance tab for networking. You will see that most of the time you never even come close to 50% utilization for gig bandwidth. Having your vswitch with more than 1 pnic is for failover not out of need for bandwidth.
As a general rule test your bandwidth using esxtop and the n for networking to make sure you are getting good speeds.
Good point. The performance data is shocking. So you are basically saying even if you had the extra nics already in the machines you wouldn't use them? Or maybe use them for VM Kernal traffic?
Ok, on our GIG network it takes on average 15 seconds to move a 500mb file. This works out to be 33 mbps on our GIG connections. You are correct that using two GIG connections would be a waste if it would give me up to 66 mbps speeds. Like I said before I saw up to 6 mbps in performance data on a server hosting 18 servers. So, maybe I will have this setup:
one port primary vm traffic
one port fail over vm traffic
one port console traffic
one port secondary console traffic
two ports for vmkernel traffic!
Isn't this fun, I love learning new things. I think it makes sense to have at least two pnics in any given vswitch, this gives you the redundancy in the event of a hardware failure. Use 2 pnics that aren't on the same physical card of course, and maybe even on different pswitches (again redundancy).
But yes, from a bandwidth point of view, unless you are running some sort of networking beast, you won't use a gig of bandwidth.
Just to be clear you are proposing using two pnics per vswitch in a "primary" and "stand by" config right? for vmware traffic.
Can you speak to the amount of bandwidth you need for the vmkernel?
I have a call into vmware now.
Sure. ESX will choose to use either or, surprisingly, ESX doesn't do load balancing. I guess they haven't really put that feature in knowing how little traffic traverses a pnic.
But yeah, I would say put atleast 2 physical nics in each virtual switch.
I don't know how much bandwidth you need for the vmkernal, but I can't imagine it being much more than your regular ports.
That's a pretty sound networking config--I agree with earlier posts that I would create my nic teams across pNics. The best practice is to have the service console on a private VLAN, segmented from the virtual machines (since if you compromise the console you can do serious damage, such as destroying data stores). The same with VMotion (VMKernel). Teaming the service console is a good idea, especially if you are going to use HA. I've seen some configurations that combine the service console and VMKernal ports on a teamed pair of nics, seperating traffic with VLAN tagging.
I'm a fan of using ip out load balancing on my nic teams, using cisco etherchannel on the switch. True, network utilization isn't usuallly near capacity, but why not actively utilize all available nic's while still providing redundancy?
VMware support frowns upon having a single vSwitch with pNics from different vendors. However, I have never had a problem. How else can you provide true redundancy right? If you have a vSwitch with two pNics on a single PCI card (in order to obtain uniformity from a vendor perspective) and that PCI card dies or becomes loose, that vSwitch and all attached VMs are now without network. One other thing to watch out for is on vSwitch0. By default, it will be set to a maximum of 26 virtual ports (this is the same as switch ports). I would get this up to 52 just in case if you put VM traffic on it. It requires a reboot, so it's better to do this during your build rather than when you need it asap.
I would consider having failover for console too. The heartbeat packets are transferred via service console and if the NIC or link is down, the VirtualCenter thinks that it's not available anymore and starts to move your VMs to another ESX (if HA is used).
VMware support frowns upon having a single vSwitch
with pNics from different vendors.
That's not quite true anymore.
After attending the presentation of Jacob Jensen at Nice, i've spoke with him and he said it better to team across chipsets.
This way you are covered in the event of a chipset failure onboard or on the pci card.
I asked him why he deviated from the best practice of teaming exclusively per chipset and he said the best practise are going to be revised.
The old statement was because of wrongly interpreted info from service calls.
He shows the configuration in the powerpoint presentation of his session at TSX
High Performance ESX Networking, slide 10
I implemented the following setup at a customer (very large enviroment)
VMNIC0 - Active
VMNIC2 - Standby
VMNIC2 - Active
VMNIC1 - active
VMNIC4 - active
VNIC0 and VNIC1 are onboard nics.
VNIC2 - VNIC5 are PCI nics (two nics on one card, in total 2 pci cards are used)
The service console normally uses vnic0, if this port fails, it will use vmnic2.
The vmotion port can only use vnic2, if this port fails, no vmotion possible.
The customer uses HA, so we need to be sure that the service console always has a connection.
We do not want a isolation response because of a network hickup.
The Network admin wasn't happy with the fact to see all different mac-addresses for the ESX machine,
so we had to be sure that one one route to the switch was active.
This is the reason we gave the second nic in the team the status standby.
By setting the status of the COS nic to unused for the VMotion port,
we were sure that no vmotion activity would disrupt the HA heartbeats when the active vmotion nic failed.
If the nic fails. NIC6 will be manual added to the vSwitch0 config.
NIC1 and NIC4 are both active in one port group. They are not aggregated into one big pipe. Link aggregation is not possible in ESX.
NIC5 is for all vlan configs.
VLAN configured networks are not much used at that site, so a 1GB nic is more than enough.
Ok, had a converstation with a VMWare rep today. He seemed very knowledgeable. Here is the jist of my convo with him:
-DO NOT team accross different manufactured NICS.
-DO NOT use stand by, just use teaming. For example if you have four ports and you want two to be teamed and two to be stand by just put all of them in one huge team. He said teaming does the same thing as standby as in one goes down the team just uses the rest.
-The way he explained teaming it was more like load balancing.
-One GIG nic for console is plenty, one gig nic for vmkernel is plenty. If you have two just put console and vmkernel on them and team them.
-Vmkernel is mostly used for transferring memory being used when a virtual machine is vmotioned.
-If you have more nics and network available use them up as one huge team for vm traffic.
That's all folks. I think I have my questions answered on the network side, on to the SAN and LUN sizing questions!!!!
Ehhhmmm, as far as I know, VMotion uses the port group which is enabled for vmotion and used this port group to transfer the used memory on the host. All the data is already on a shared storage, this data isn't copied.
It looks like VMware isn't communicating internally as well as they should be. At presentations they tell team across different manufactures, support recommends not to.
Console and VMkernel are the same thing, do you mean Console and VMotion?
It becomes a huge team if you team them, but they are always seperate connections. It does not aggregate bandwidth. When you place two 1 gig nics in a team, you do not end up with one path to a 2 gig connection.
You get a path two 2 seperate 1 gig connections. It can use one of them at one time. It can alternate between connections.
As far as vmkernel and the console. I am talking about the three things you can add to a vswitch. vm traffic, vm kernel and the console. according to the docs and the tech the console traffic and the vm kernel traffic are different.
I believe that the "running" memory for the VM being vmotioned that resides in the hosts RAM will be exchanged via the vmkernal vswitch. So no data (as in VMDK data) is exchanged, that is on the SAN, its only the running memory transactions that are exchanged between hosts.