VMware Cloud Community
mike_caddy
Enthusiast
Enthusiast
Jump to solution

Peer Review of ESX Architecture

I have been charged with developing our VMware enterprise architecture. To that end I have produced the attached document.

I have only done the VMware fast track training so my proposals come a purely academic angle

Does anyone have an opinion of the decisions I have made?

Regards

Mike

0 Kudos
1 Solution

Accepted Solutions
BenConrad
Expert
Expert
Jump to solution

Hi Mike,

There isn't a hard stop when attempting to add mixed speed storage into a singular pool. You will get a warning message and when the data is spread across 2 members you'll have 50% of your volume on 7.2K and 50% on 10K. That's asking for trouble if you run into any performance issues.

As for the dual-active controller question, I think that you should plan on utilizing the current active/passive features for the foreseeable future.

Ben

View solution in original post

0 Kudos
24 Replies
kjb007
Immortal
Immortal
Jump to solution

Your decisions are good, for the most part. I did see that you had management and vmotion on individual NICs. I would recommend having redundant paths to both of these segments as well as your production traffic. If you're not going to be running huge amounts of I/O, which I'm anticipating you are not, then dedicting 4 ports to your prod network is a bit overkill. Also, if you're only using ports off the 1 4-port card, then you're setting yourself up for failure in case that card fails.

These servers will include 2 on board, and 4 on your card, giving you 6 total.

I would use 2 for vmotion, 2 for mgmt, and 2 for prod network. This should give you optimal redundancy, and should still give you sufficient network bandwidth.

Your physical network environment will come into play as well. If you're using multiple pSwitch's, then make sure your teamed NICs go to a separate pSwitch each.

Good luck, and welcome to the forums.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
wgardiner
Hot Shot
Hot Shot
Jump to solution

At first glance that all looks pretty well thought out to me. 1 question, why are you deciding to use iSCSI for your SAN instead of fibre?

0 Kudos
TomHowarth
Leadership
Leadership
Jump to solution

mike, I will critic your proposal and give you answers via Private Message

Tom Howarth

VMware Communities User Moderator

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
0 Kudos
wgardiner
Hot Shot
Hot Shot
Jump to solution

Tom any chance you could post them here for us all to have a read of?

0 Kudos
Texiwill
Leadership
Leadership
Jump to solution

Hello,

Welcome to the forums..... A few concerns...

You have redundancy for SC, iSCSI, and triple redundancy for your Production network but not for your vMotion network. If it was me, I would move one of the Production pNICs for purely redundant vMotion network. Also, you really should have 2 pSwitches for Management and vMotion for best redundancy of network capabilities. Unless these are all VLANs on multiple switches.

Not sure if the triple redundancy is to provide any thing special.

The other item is the use of ESXi... It has some security issues, and you may wish to read the Forum for posts on why to not use ESXi. If you are not all that concerned with security its a fine solution. Or you find the restrictions it imposes acceptable... Your plan should at least address all these concerns, and security seems to not be a part of the plan. It should be from the beginning.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education. CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354, As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
mike_caddy
Enthusiast
Enthusiast
Jump to solution

The R805 is one of Dells first Virtulisation optimised servers it has 4 onboard GigE ports.

The only areas where I have used multiple pSwitches is on the production and storage networks. I was happy to have downtime on the Management network in the event of a switch failure as I can swap that if I need to with no loss of service on the production network, but I guess for the price of a switch I might double up.

Do I really need (as the VMware training material suggests) seperate networks for vMotion and Management?

0 Kudos
mike_caddy
Enthusiast
Enthusiast
Jump to solution

WGardiner,

As we need to put in two systems it's primarilly a cost issue, the expense in a redundant switch fabric and hba's will push the budgets somewhat.

Also we have no experience in Fibre Channel

0 Kudos
mike_caddy
Enthusiast
Enthusiast
Jump to solution

The single NIC for vMotion was a recomendation by the trainer at the VMWare training as he said that the only risk was a loss of vMotion but I see a scenario now when the NIC with the vMotion port fails and you need to put it into maintanance to repair but can't vMotion off as the port is down.

The triple redundancy was mainly because I wasn't sure what else to do with it, I was under the impression that rather than redundancy the outbound trafic would be load balanced over the 3 NICs

I was led to belive that ESXi is actually more secure? Whist at VMware in Frimley they said that if we had no existing infrastructure then ESXi was the thing to go for as it was more secure (no local service console) and was "The future"

Maybe it's a bit too soon to go for ESXi?

0 Kudos
kjb007
Immortal
Immortal
Jump to solution

Unless you change the default behavior, loss of the management network, means no service console, means isolation response, which will shut down your vm's in an attempt of HA to kick in and failover your vm's to another host. You can modify your isolation response to leave the vm's up and running, but having a redundant service console would make just as much sense.

You don't typically have to have separate networks for management and vmotion, and to that end, you can use a teamed set of pNICs for both vMotion and management. Just use one as active/standby for management port, and then use standby/active for the vMotion port.

That way, you will be fully redundant.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
AndrewSt
Enthusiast
Enthusiast
Jump to solution

I see a few things that need to be taken into account.

1) You appear to be using iSCSI for your ESX storage. ESX requires iSCSI (and all NAS traffic) to use the VMKernel nics. From the diagram, it looks like you will have three VMkernel nics, with one being used for Vmotion. Now, you can use multiple VMkernel nics, but make sure you only use one active on your storage pair. ESX has had trouble handling multiple network paths on VMKernel networks.

2) In your two node cluster - you have what appears to be a 'cross-over' between your two ESX hosts for VMkernel. You need to specify a gateway device that both members can ping. This is important for determining if your ESX servers go into 'isolation mode'. If you loose ability to hit your gateway, both members of your cluster can shutdown, and you loose any HA built in. (This gateway to ping, and isolation mode will affect all servers. I just note it here, because of the cross-over.)

3) I think you are hitting over-kill on the number of nics used, unless you are not using vlan tagging on the vswitches. Multiple vlans can be run on a single active/active trunk, and I haven't seen where one ESX host can fill a 2Gb ethernet link (2x1Gb).

-


-Andrew Stueve

----------------------- -Andrew Stueve -Remember, if you found this or any other answer useful, please consider the use of the Helpful or Correct buttons to award points
0 Kudos
mike_caddy
Enthusiast
Enthusiast
Jump to solution

Thanks Andrew,

2) I thought the "pinging of the gateway" occured over the management lan? I had planned to set the gateway for HA to be the virtual center server

3) We don't use vlan tagging, We've only ended up with 2xGigE for redundancy, I doubt we will be likeley to hit the limits.

We are probably going to end up with iSCSI HBA's which might free even more NICs I think we'll stick with the onboard 4 and the aditional 4 for scalability/redundancy

0 Kudos
BenConrad
Expert
Expert
Jump to solution

First off, nice writeup!

I don't see a connection between the 2 iSCSI switches. You will need the switches connected with multiple Gb links (LACP/PAGP), there is nothing stopping EthX on the ESX server from wanting to go to eth2 on the EQL boxes after it's been redirected away from the group IP addr. If that needs to traverse from switch 1 -> 2 the iSCSI connection will not work.

One other thing, since you will have a SAS and SATA EQL group you will want to put them in separate pools in the group. The current EQL firmwares recommend against mixed speed drives in the same pool.

Ben

0 Kudos
mike_caddy
Enthusiast
Enthusiast
Jump to solution

Thanks for your feedback ben,

You make a good point with the switches I had forgotten that.

It's an interesting point about the EQL boxes needing to be in differnt pools as the Tech support from dell didn't seem to know that.

They also didn't know how long it would be before the firmware's where upgraded so that both Storage controls are active (at the same time). Any ideas?

0 Kudos
BenConrad
Expert
Expert
Jump to solution

Hi Mike,

There isn't a hard stop when attempting to add mixed speed storage into a singular pool. You will get a warning message and when the data is spread across 2 members you'll have 50% of your volume on 7.2K and 50% on 10K. That's asking for trouble if you run into any performance issues.

As for the dual-active controller question, I think that you should plan on utilizing the current active/passive features for the foreseeable future.

Ben

0 Kudos
Texiwill
Leadership
Leadership
Jump to solution

Hello,

If you have a spare pNIC then adding it to your vMotion network and adding some redundancy to the management switch would be my main changes to the design. Redundancy should be paramount as Kjb007 has stated, lack of access to the SC ports means no management capability, which includes vMotion.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education. CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354, As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
TomHowarth
Leadership
Leadership
Jump to solution

first thing I noticed was a confusion as to what version of ESX you are going to deploy, you mention ESXi and foundation edition in the same paragraph. I becomes apparent as we read on but use the correct nomaclature.

see first paragraph page 2 ESXi is the version that can be installed on a Flash device, however it only has experimental HA support and no other higher function such as DRS and Vmotion again clairifaction is requried as you go on to mention VMotion networks, VCB and Service Consoles.

If you are having a phyiscal server for VC then consider a lower spec machine. also unless there is complete phyiscal seperation between the two networks make sure that one of your VC's always deploys Guests so as to prevent to posibility of a duplicate MAC address being generated.

are you sure of the ability of load balance the two SAN's as they have disparate Storage capacity, (I am not familiar with this particular manufacture)

Initial Split deployment design

Consider two cables to your VC in your production environment.

I would remove one of the NICs to your VMnetwork and add it to your VMkernel network to gain resilance.

The rest of the issues have been covered by the other posters

Tom Howarth

VMware Communities User Moderator

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
0 Kudos
dominic7
Virtuoso
Virtuoso
Jump to solution

ESXi supports DRS and VMotion, and since ESX 3.5.0 Update 1 it also supports HA.

0 Kudos
TomHowarth
Leadership
Leadership
Jump to solution

I stand corrected on DRS and Vmotion, but what about VCB

Tom Howarth

VMware Communities User Moderator

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
0 Kudos
dominic7
Virtuoso
Virtuoso
Jump to solution

To my knowledge ( and I have yet to see anything to the contrary ) VCB is supported in ESXi / ESX embedded. As far as I know all versions of ESX 3.5.0 / installable / embedded should have feature parity. I know that HA was an odd item that was left out of the support matrix for ESXi when 3.5.0 came out, and that there are no CIM providers currently in ESX 3.5.0 but I believe the rest of the features/support are the same.

0 Kudos