Servie console vmotion network failover and diagram

I have been charged with developing our VMware enterprise architecture. To that end I have produced the attached document.

I have only done the VMware fast track training so my proposals come a purely academic angle

Does anyone have an opinion of the decisions I have made?

Regards

Mike

Your decisions are good, for the most part. I did see that you had management and vmotion on individual NICs. I would recommend having redundant paths to both of these segments as well as your production traffic. If you're not going to be running huge amounts of I/O, which I'm anticipating you are not, then dedicting 4 ports to your prod network is a bit overkill. Also, if you're only using ports off the 1 4-port card, then you're setting yourself up for failure in case that card fails.

These servers will include 2 on board, and 4 on your card, giving you 6 total.

I would use 2 for vmotion, 2 for mgmt, and 2 for prod network. This should give you optimal redundancy, and should still give you sufficient network bandwidth.

Your physical network environment will come into play as well. If you're using multiple pSwitch's, then make sure your teamed NICs go to a separate pSwitch each.

Good luck, and welcome to the forums.

-KjB

At first glance that all looks pretty well thought out to me. 1 question, why are you deciding to use iSCSI for your SAN instead of fibre?

mike, I will critic your proposal and give you answers via Private Message

Tom Howarth
VMware Communities User Moderator

Tom any chance you could post them here for us all to have a read of?

Hello,

Welcome to the forums..... A few concerns...

You have redundancy for SC, iSCSI, and triple redundancy for your Production network but not for your vMotion network. If it was me, I would move one of the Production pNICs for purely redundant vMotion network. Also, you really should have 2 pSwitches for Management and vMotion for best redundancy of network capabilities. Unless these are all VLANs on multiple switches.

Not sure if the triple redundancy is to provide any thing special.

The other item is the use of ESXi... It has some security issues, and you may wish to read the Forum for posts on why to not use ESXi. If you are not all that concerned with security its a fine solution. Or you find the restrictions it imposes acceptable... Your plan should at least address all these concerns, and security seems to not be a part of the plan. It should be from the beginning.

Best regards,
Edward L. Haletky
VMware Communities User Moderator
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education. CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354, As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

The R805 is one of Dells first Virtulisation optimised servers it has 4 onboard GigE ports.

The only areas where I have used multiple pSwitches is on the production and storage networks. I was happy to have downtime on the Management network in the event of a switch failure as I can swap that if I need to with no loss of service on the production network, but I guess for the price of a switch I might double up.

Do I really need (as the VMware training material suggests) seperate networks for vMotion and Management?

WGardiner,

As we need to put in two systems it's primarilly a cost issue, the expense in a redundant switch fabric and hba's will push the budgets somewhat.

Also we have no experience in Fibre Channel

The single NIC for vMotion was a recomendation by the trainer at the VMWare training as he said that the only risk was a loss of vMotion but I see a scenario now when the NIC with the vMotion port fails and you need to put it into maintanance to repair but can't vMotion off as the port is down.

The triple redundancy was mainly because I wasn't sure what else to do with it, I was under the impression that rather than redundancy the outbound trafic would be load balanced over the 3 NICs

I was led to belive that ESXi is actually more secure? Whist at VMware in Frimley they said that if we had no existing infrastructure then ESXi was the thing to go for as it was more secure (no local service console) and was "The future"

Maybe it's a bit too soon to go for ESXi?

Unless you change the default behavior, loss of the management network, means no service console, means isolation response, which will shut down your vm's in an attempt of HA to kick in and failover your vm's to another host. You can modify your isolation response to leave the vm's up and running, but having a redundant service console would make just as much sense.

You don't typically have to have separate networks for management and vmotion, and to that end, you can use a teamed set of pNICs for both vMotion and management. Just use one as active/standby for management port, and then use standby/active for the vMotion port.

That way, you will be fully redundant.

-KjB

I see a few things that need to be taken into account.

1) You appear to be using iSCSI for your ESX storage. ESX requires iSCSI (and all NAS traffic) to use the VMKernel nics. From the diagram, it looks like you will have three VMkernel nics, with one being used for Vmotion. Now, you can use multiple VMkernel nics, but make sure you only use one active on your storage pair. ESX has had trouble handling multiple network paths on VMKernel networks.

2) In your two node cluster - you have what appears to be a 'cross-over' between your two ESX hosts for VMkernel. You need to specify a gateway device that both members can ping. This is important for determining if your ESX servers go into '*isolation mode*'. If you loose ability to hit your gateway, both members of your cluster can shutdown, and you loose any HA built in. (This gateway to ping, and isolation mode will affect all servers. I just note it here, because of the cross-over.)

3) I think you are hitting over-kill on the number of nics used, unless you are not using vlan tagging on the vswitches. Multiple vlans can be run on a single active/active trunk, and I haven't seen where one ESX host can fill a 2Gb ethernet link (2x1Gb).

-Andrew Stueve

Thanks Andrew,

2) I thought the "pinging of the gateway" occured over the management lan? I had planned to set the gateway for HA to be the virtual center server

3) We don't use vlan tagging, We've only ended up with 2xGigE for redundancy, I doubt we will be likeley to hit the limits.

We are probably going to end up with iSCSI HBA's which might free even more NICs I think we'll stick with the onboard 4 and the aditional 4 for scalability/redundancy

First off, nice writeup!

I don't see a connection between the 2 iSCSI switches. You will need the switches connected with multiple Gb links (LACP/PAGP), there is nothing stopping EthX on the ESX server from wanting to go to eth2 on the EQL boxes after it's been redirected away from the group IP addr. If that needs to traverse from switch 1 -> 2 the iSCSI connection will not work.

One other thing, since you will have a SAS and SATA EQL group you will want to put them in separate pools in the group. The current EQL firmwares recommend against mixed speed drives in the same pool.

Ben

Thanks for your feedback ben,

You make a good point with the switches I had forgotten that.

It's an interesting point about the EQL boxes needing to be in differnt pools as the Tech support from dell didn't seem to know that.

They also didn't know how long it would be before the firmware's where upgraded so that both Storage controls are active (at the same time). Any ideas?

Hi Mike,

There isn't a hard stop when attempting to add mixed speed storage into a singular pool. You will get a warning message and when the data is spread across 2 members you'll have 50% of your volume on 7.2K and 50% on 10K. That's asking for trouble if you run into any performance issues.

As for the dual-active controller question, I think that you should plan on utilizing the current active/passive features for the foreseeable future.

Ben

Hello,

If you have a spare pNIC then adding it to your vMotion network and adding some redundancy to the management switch would be my main changes to the design. Redundancy should be paramount as Kjb007 has stated, lack of access to the SC ports means no management capability, which includes vMotion.

Best regards,
Edward L. Haletky
VMware Communities User Moderator
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education. CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354, As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

first thing I noticed was a confusion as to what version of ESX you are going to deploy, you mention ESXi and foundation edition in the same paragraph. I becomes apparent as we read on but use the correct nomaclature.

see first paragraph page 2 ESXi is the version that can be installed on a Flash device, however it only has experimental HA support and no other higher function such as DRS and Vmotion again clairifaction is requried as you go on to mention VMotion networks, VCB and Service Consoles.

If you are having a phyiscal server for VC then consider a lower spec machine. also unless there is complete phyiscal seperation between the two networks make sure that one of your VC's always deploys Guests so as to prevent to posibility of a duplicate MAC address being generated.

are you sure of the ability of load balance the two SAN's as they have disparate Storage capacity, (I am not familiar with this particular manufacture)

Initial Split deployment design

Consider two cables to your VC in your production environment.

I would remove one of the NICs to your VMnetwork and add it to your VMkernel network to gain resilance.

The rest of the issues have been covered by the other posters

Tom Howarth
VMware Communities User Moderator

ESXi supports DRS and VMotion, and since ESX 3.5.0 Update 1 it also supports HA.

I stand corrected on DRS and Vmotion, but what about VCB

Tom Howarth
VMware Communities User Moderator

To my knowledge ( and I have yet to see anything to the contrary ) VCB is supported in ESXi / ESX embedded. As far as I know all versions of ESX 3.5.0 / installable / embedded should have feature parity. I know that HA was an odd item that was left out of the support matrix for ESXi when 3.5.0 came out, and that there are no CIM providers currently in ESX 3.5.0 but I believe the rest of the features/support are the same.

I didn't think I had mentioned foundation edition anywhere in the document?

"VMware ESXi 3.5 Enterprise has been selected and will be installed onto a SD card or USB stick inside the

servers."

This has been selected due baised on VMware recomendations, They see it as the future direction for there product line. As the previous poster has stated, if you buy the enterprise licence you can deploy either ESX, ESXi Installable or ESXi embedded. The trainer on the fast track VMware training courses claimed that VCB was supported.

That said other coments and further research I have conducted suggests that it might still be a little earlly to adopt ESXi.

Dell claimed trher where no problems load balancing between EqualLogic SANs of differnt sizes or Spindle speeds however one of the other poster has reported otherwise, I will further evaluate this decision once we are in a position to procced with the combined deployment.

I think I will now use the same virtually teamed and physically redundant GigE infrastrcuture for both vMotion and Management, does anyone percieve any problems with this?

you are correct you did not, however you did mention ESXi and Enterprise in the same sentence. the two are from diferent product sets

Also ESXi 's management is not a easy. the RCLI is not a feature rich. it does not even include a kill command to give the ability to shut down a hung VM from the command line

Tom Howarth
VMware Communities User Moderator

If you use the hackable features of "unsupported" command while pressing ALT+F1 and with administrator password and uncomment out # ssh part of the line from /etc/inetd.conf file you will be able to use the kill command and fully adminstrative service console tools.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

Regards,

Stefan Nguyen

iGeek Systems Inc.

VMware, Citrix, Microsoft Consultant

Small, but I wanted to contribute...

Remember to open the ports on the ESX firewall for iscsi.

Run VC on physical, though vm is supported, based on my experiences I wouldn't do it.

Kaizen!

This document was generated from the following thread: Peer Review of ESX Architecture

All

Servie console vmotion network failover and diagram

Servie console vmotion network failover and diagram