Servie console vmotion network failover and diagram

Servie console vmotion network failover and diagram

I have been charged with developing our VMware enterprise architecture. To that end I have produced the attached document.

I have only done the VMware fast track training so my proposals come a purely academic angle

Does anyone have an opinion of the decisions I have made?

Regards


Mike


Your decisions are good, for the most part. I did see that you had  management and vmotion on individual NICs.  I would recommend having  redundant paths to both of these segments as well as your production  traffic.  If you're not going to be running huge amounts of I/O, which  I'm anticipating you are not, then dedicting 4 ports to your prod  network is a bit overkill.  Also, if you're only using ports off the 1  4-port card, then you're setting yourself up for failure in case that  card fails.

These servers will include 2 on board, and 4 on your card, giving you 6 total.

I would use 2 for vmotion, 2 for mgmt, and 2 for prod network.  This  should give you optimal redundancy, and should still give you sufficient  network bandwidth.

Your physical network environment will come into play as well.  If  you're using multiple pSwitch's, then make sure your teamed NICs go to a  separate pSwitch each.

Good luck, and welcome to the forums.


-KjB


At first glance that all looks pretty well thought out to me. 1  question, why are you deciding to use iSCSI for your SAN instead of  fibre?


mike, I will critic your proposal and give you answers via Private Message


Tom Howarth
VMware Communities User Moderator


Tom any chance you could post them here for us all to have a read of?


Hello,

Welcome to the forums..... A few concerns...

You have redundancy for SC, iSCSI, and triple redundancy for your  Production network but not for your vMotion network. If it was me, I  would move one of the  Production pNICs for purely redundant vMotion  network. Also, you really should have 2 pSwitches  for Management and  vMotion for best redundancy of network capabilities. Unless these are  all VLANs on multiple switches.

Not sure if the triple redundancy is to provide any thing special.

The other item is the use of ESXi... It has some security issues, and  you may wish to read the Forum for posts on why to not use ESXi. If you  are not all that concerned with security its a fine solution. Or you  find the restrictions it imposes acceptable... Your plan should at least  address all these concerns, and security seems to not be a part of the  plan. It should be from the beginning.

Best regards,
Edward L. Haletky
VMware Communities User Moderator
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and  Securing Virtualization Servers', Copyright 2008 Pearson Education. CIO  Virtualization Blog: http://www.cio.com/blog/index/topic/168354, As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization


The R805 is one of Dells first Virtulisation optimised servers it has 4 onboard GigE ports.

The only areas where I have used multiple pSwitches is on the production  and storage networks. I was happy to have downtime on the Management  network in the event of a switch failure as I can swap that if I need to  with no loss of service on the production network, but I guess for the  price of a switch I might double up.


Do I really need (as the VMware training material suggests) seperate networks for vMotion and Management?


WGardiner,

As we need to put in two systems it's primarilly a cost issue, the  expense in a redundant switch fabric and hba's will push the budgets  somewhat.


Also we have no experience in Fibre Channel


The single NIC for vMotion was a recomendation by the trainer at the  VMWare training as he said that the only risk was a loss of vMotion but I  see a scenario now when the NIC with the vMotion port fails and you  need to put it into maintanance to repair but can't vMotion off as the  port is down.

The triple redundancy was mainly because I wasn't sure what else to do  with it, I was under the impression that rather than redundancy the  outbound trafic would be load balanced over the 3 NICs


I was led to belive that ESXi is actually more secure? Whist at VMware  in Frimley they said that if we had no existing infrastructure then ESXi  was the thing to go for as it was more secure (no local service  console) and was "The future"


Maybe it's a bit too soon to go for ESXi?


Unless you change the default behavior, loss of the management network,  means no service console, means isolation response, which will shut down  your vm's in an attempt of HA to kick in and failover your vm's to  another host.  You can modify your isolation response to leave the vm's  up and running, but having a redundant service console would make just  as much sense.

You don't typically have to have separate networks for management and  vmotion, and to that end, you can use a teamed set of pNICs for both  vMotion and management.  Just use one as active/standby for management  port, and then use standby/active for the vMotion port.

That way, you will be fully redundant.

-KjB


I see a few things that need to be taken into account.

1)  You appear to be using iSCSI for your ESX storage.  ESX requires  iSCSI (and all NAS traffic) to use the VMKernel nics.  From the diagram,  it looks like you will have three VMkernel nics, with one being used  for Vmotion.  Now, you can use multiple VMkernel nics, but make sure you  only use one active on your storage pair.  ESX has had trouble handling  multiple network paths on VMKernel networks.


2) In your two node cluster - you have what appears to be a 'cross-over'  between your two ESX hosts for VMkernel.  You need to specify a gateway  device that both members can ping.  This is important for determining  if your ESX servers go into '*isolation mode*'.  If you loose ability to  hit your gateway, both members of your cluster can shutdown, and you  loose any HA built in.  (This gateway to ping, and isolation mode will  affect all servers.  I just note it here, because of the cross-over.)


3) I think you are hitting over-kill on the number of nics used, unless  you are not using vlan tagging on the vswitches.  Multiple vlans can be  run on a single active/active trunk, and I haven't seen where one ESX  host can fill a 2Gb ethernet link (2x1Gb).


-Andrew Stueve


Thanks Andrew,

2) I thought the "pinging of the gateway" occured over the management  lan? I had planned to set the gateway for HA to be the virtual center  server


3) We don't use vlan tagging, We've only ended up with 2xGigE for redundancy, I doubt we will be likeley to hit the limits.


We are probably going to end up with iSCSI HBA's which might free even  more NICs I think we'll stick with the onboard 4 and the aditional 4 for  scalability/redundancy


First off, nice writeup!

I don't see a connection between the 2 iSCSI switches.  You will need  the switches connected with multiple Gb links (LACP/PAGP), there is  nothing stopping EthX on the ESX server from wanting to go to eth2 on  the EQL boxes after it's been redirected away from the group IP addr.   If that needs to traverse from switch 1 -> 2 the iSCSI connection  will not work.

One other thing, since you will have a SAS and SATA EQL group you will  want to put them in separate pools in the group.  The current EQL  firmwares recommend against mixed speed drives in the same pool.

Ben


Thanks for your feedback ben,

You make a good point with the switches I had forgotten that.


It's an interesting point about the EQL boxes needing to be in differnt  pools as the Tech support from dell didn't seem to know that.


They also didn't know how long it would be before the firmware's where  upgraded so that both Storage controls are active (at the same time).  Any ideas?


Hi Mike,

There isn't a hard stop when attempting to add mixed speed storage into a  singular pool.  You will get a warning message and when the data is  spread across 2 members you'll have 50% of your volume on 7.2K and 50%  on 10K.  That's asking for trouble if you run into any performance  issues.

As for the dual-active controller question, I think that you should plan  on utilizing the current active/passive features for the foreseeable  future.

Ben


Hello,

If you have a spare pNIC then adding it to your vMotion network and  adding some redundancy to the management switch would be my main changes  to the design. Redundancy should be paramount as Kjb007 has stated,  lack of access to the SC ports means no management capability, which  includes vMotion.

Best regards,
Edward L. Haletky
VMware Communities User Moderator
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and  Securing Virtualization Servers', Copyright 2008 Pearson Education. CIO  Virtualization Blog: http://www.cio.com/blog/index/topic/168354, As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization


first thing I noticed was a confusion as to what version of ESX you are  going to deploy, you mention ESXi and foundation edition in the same  paragraph.  I becomes apparent as we read on but use the correct  nomaclature.

see first paragraph page 2  ESXi is the version that can be installed on  a Flash device,  however it only has experimental HA support and no  other higher function such as DRS and Vmotion again clairifaction is  requried as you go on to mention VMotion networks,  VCB and  Service  Consoles.

If you are having a phyiscal server for VC then consider a lower spec  machine. also unless there is complete phyiscal seperation between the  two networks make sure that one of your VC's always deploys Guests so as  to prevent to posibility of a duplicate MAC address being generated.


are you sure of the ability of load balance the two SAN's as they have  disparate Storage capacity, (I am not familiar with this particular  manufacture)


Initial Split deployment design


Consider two cables to your VC in your production environment.


I would remove one of the NICs to your VMnetwork and add it to your VMkernel network to gain resilance.


The rest of the issues have been covered by the other posters

Tom Howarth
VMware Communities User Moderator


ESXi supports DRS and VMotion, and since ESX 3.5.0 Update 1 it also supports HA.


I stand corrected on DRS and Vmotion, but what about VCB


Tom Howarth
VMware Communities User Moderator


To my knowledge ( and I have yet to see anything to the contrary ) VCB  is supported in ESXi / ESX embedded. As far as I know all versions of  ESX 3.5.0 / installable / embedded should have feature parity. I know  that HA was an odd item that was left out of the support matrix for ESXi  when 3.5.0 came out, and that there are no CIM providers currently in  ESX 3.5.0 but I believe the rest of the features/support are the same.


I didn't think I had mentioned foundation edition anywhere in the document?

"VMware ESXi 3.5 Enterprise has been selected and will be installed onto a SD card or USB stick inside the

servers."


This has been selected due baised on VMware recomendations, They see it  as the future direction for there product line. As the previous poster  has stated, if you buy the enterprise licence you can deploy either ESX,  ESXi Installable or ESXi embedded. The trainer on the fast track VMware  training courses claimed that VCB was supported.


That said other coments and further research I have conducted suggests that it might still be a little earlly to adopt ESXi.


Dell claimed trher where no problems load balancing between EqualLogic  SANs of differnt sizes or Spindle speeds however one of the other poster  has reported otherwise, I will further evaluate this decision once we  are in a position to procced with the combined deployment.


I think I will now use the same virtually teamed and physically  redundant GigE infrastrcuture for both vMotion and Management, does  anyone percieve any problems with this?


you are correct you did not, however you did mention ESXi and Enterprise  in the same sentence.  the two are from diferent product sets

Also ESXi 's management is not a easy.  the RCLI is not a feature rich.  it does not even include a kill command to give the ability to shut down  a hung VM from the command line


Tom Howarth
VMware Communities User Moderator


If you use the hackable features of "unsupported" command while pressing  ALT+F1 and with administrator password and uncomment out # ssh part of  the line from /etc/inetd.conf file you will be able to use the kill  command and fully adminstrative service console tools.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

Regards,

Stefan Nguyen

iGeek Systems Inc.

VMware, Citrix, Microsoft Consultant


Small, but I wanted to contribute...

Remember to open the ports on the ESX firewall for iscsi.


Run VC on physical, though vm is supported, based on my experiences I wouldn't do it.

Kaizen!

This document was generated from the following thread: Peer Review of ESX Architecture

Version history
Revision #:
1 of 1
Last update:
‎06-19-2008 10:40 AM
Updated by: