VMware Cloud Community
francisbandi
Enthusiast
Enthusiast

2nd Service console - Best Practices

Hi

I am planning to implement 2nd service console to provide resilience incase one of the SC is not available due to a network outage.

My environment is as follows

vc2.5 u2/ ESX 3.5 Upd2 running on Blades with 4 Nics

we have vSwitch 0 with two pNICs to carry Management traffic -i.e traffic for VM management, Service Console 0 and vMotion

Our vSwitch 1 is for Production network and attached to two pNICs.

it has one pror group called production

Now we want create a 2nd service console on vSwitch 1 - with a port group service Console 1

we do use separate IP as das.isolation address

Our expectation is as follows

if for any reason - we loose service console 0 (due to network outage)

1 we want to keep our VM up and running since service console 1 still alive.

2 sttill able to manage the virtual envioronment via VC. (no disconnect of ESX host from VC)

is any one running 2nd service console as proposed above ?

do you have any best practice guidelines

any pros and cons on proposed solution above..

appreciate any help

Reply
0 Kudos
15 Replies
Troy_Clavell
Immortal
Immortal

A second service console is one way, but you could also set your VM's to "leave powered on" in the event of an HA isolation. You could use your vMotion NIC as an isolation NIC as well or increase the default isolation timeout.

A second COS will help a view in vCenter, but if your isolation response is set to leave VM's powered on, you can still manage them while the host appears as "not responding" during the isolation event. If you lose a Host other than a simple isolation, a second COS really isn't going to give you much.

My personal preference if not using iSCSI is to ony run a single COS and use the HA advanced options. Below is a good description of all your options.

Reply
0 Kudos
francisbandi
Enthusiast
Enthusiast

Troy

- thanks for quick reply..

VM can be left power on - with advancd option with out 2nd service console.

However - we want to have access to VMs via Virtual Center so that we can still manage the vm / host for perfromance data.

we do not want to see the host - disconnected.

This is main reason why we want this 2nd service console.

there were time when we loose entire subnet / vlan due to a human error / switch failure, core failure..

still we want maitain the access to ESX via virtual Ceneter.

-- so I need help on -

1 - do I need 2nd das isolation address - some say.. does not really need

2 do I need 2nd vmotion interface (please note I have vmotion on vSwitch 0) - how DRS behave.. if there is no vmotion interface ?

3 how test this HA fail over scenarios with two service consoles before production

4 how the Service console 0 and service console 1 talk to each other and monitor each or fail-over to each other etc..

Reply
0 Kudos
Azriphale
Enthusiast
Enthusiast

Use the power off VMs option if you are using iSCSI storage. If your ESX has become isolated from the network you are not going to read of write to iSCSI. If you are using FC then by all means, leave VMs powered on or shut down gracefully.

In answer to the questions...

  1. HA will try to contact the other ESX hosts service consoles in the cluster for 15 seconds. Should it fail to contact any of them, then it will attempt to get to the DAS_isolation address. If you have a second DAS isolation address it will attempt contact that. If that fails then it will assume it is isolated. These isolation addresses are unrelated to the number of service consoles. If you do add a second service console you should probably up the isolation response time to 20 seconds to give ESX a chance to try the backup.

  2. If you have no Vmotion port then neither you or DRS will be able to vmotion. Given that your vmkernel ports should all be on the same broadcast domain, I would recommend looking at nic teaming rather than adding another port. You can set any additional nics to be backups and connect them to separate switches. My understanding would cause less of an overhead to the ESX hosts.

  3. The only realistic way to test HA is to pull some network cables.

  4. I am not 100% sure about this. I think that ESX swaps a list of its service console addresses when HA is configured. From there, it works out the various options for communication by some form of route discovery. All I do know is you don't need to route between the service console LANs.

And having two service consoles is a very good idea.

HTH

Azi

Troy_Clavell
Immortal
Immortal

1 - do I need 2nd das isolation address - some say.. does not really need

yes, and that would be setup through the advanced options for HA

2 do I need 2nd vmotion interface (please note I have vmotion on vSwitch 0) - how DRS behave.. if there is no vmotion interface ?

I would create the second COS on the same vSwitch as your VMkernel Port

3 how test this HA fail over scenarios with two service consoles before production

disable the NIC for the COS, or pull the cable

4 how the Service console 0 and service console 1 talk to each other and monitor each or fail-over to each other etc..

common gateway.

Reply
0 Kudos
francisbandi
Enthusiast
Enthusiast

Azi...

what do mean by teaming rather that adding another port. (I have 2 pNICs for vSwitch 1 already)

you also said , I can not have vmkernal ports on two separate broadcast domains..

this means - i am stuck with no vmotion/ DRS that will be associated with 2nd service console.

tenhincally, I can have 2nd service console with 2nd DAS which will provide me ESX server availability with in the Virtual Ceneter if there is a network outage on other Network where I have my primary service console and Vmotion network.

What i get from this type of configuration is , access to ESX server environement, but no DRS and Vmotion still I restore my primary network (with vmotion interface)

is this correct assumption.. or do you have any other thoughts

Reply
0 Kudos
java_cat33
Virtuoso
Virtuoso

I would do the method Troy advised - create your second COS on the same vswitch (vswitch0) where the vmkernel is configured for vmotion. In this setup, your second COS would be on the same subnet as your vmotion network however as the service console can only have one gateway, the gateway of the second COS would be of the primary.

Then for the das.isolationaddress2 - this should specify a reliable hardware device on the same subnet as the second COS.

Reply
0 Kudos
francisbandi
Enthusiast
Enthusiast

There is no point having 2nd service console on same subnet..

If there is issue on that subnet.. You will have server disconnected from VC.

we are trying to provide an access to ESX environment via VC if there is an issue with primary service console..

And want to hear from community on pros and cons..on setting of 2nd service console on other subnet (data network)

we have send up our primary service console with redundance pNICs. still there will be situation where entire subnet will be gone at core level..

so we are planning to provide other means to reach ESX servers from Virtual Center for performance metrics and other while other network is down.

-- now I have other chanllenge.. on providing DAS.. isolation address ..which should be pingble to both service consoles.. i belive..

Reply
0 Kudos
Troy_Clavell
Immortal
Immortal

you HAVE to have a common gateway.... So, try to create a second COS anywhere with a different gateway and you will get an error. So, this is why I said, if you are going to have two, create your primary, then secondary using the vmkernel gateway.

Reply
0 Kudos
java_cat33
Virtuoso
Virtuoso

There is no point having 2nd service console on same subnet..

What?? If you are limited in nics, then you're gonna probably need to use the same subnet as the vmotion network for the secondary SC

If there is issue on that subnet.. You will have server disconnected from VC.

So? At least your VM's haven't powered off as the secondary SC is still online

e are trying to provide access ESX environment if there is issue with primary service console..

You can still manage the host via physical console or HP iLO for example

And want to find pros and cons..on setting of 2nd service console on other subnet (data network)

Setting the second SC on the data network will work, however it's best practise to keep your management traffic away from production traffic.

Reply
0 Kudos
francisbandi
Enthusiast
Enthusiast

My thoughts are as follows..

- I will have my 2nd service console on data path.. which will come only to my use if other network having my primary service console is dead.

-- I will loose my DRS capability and Vmotion capability.. but still my ESx servers do not disconnect from Virtual Center

This means.. I still can able to pull my performance gaps and still able to see my memory and CPU usages

Monitor the VMs during such network failures..

I will only have one DAS which can be reached from primary service console..

I may not need another DAS for 2nd VC.. as long as heart beats happens between service consoles on Data network.

any advice such approach..

Reply
0 Kudos
Azriphale
Enthusiast
Enthusiast

Nic teaming is the way to provide network redundancy. If you are

concerned about losing switches/routers, this is where you need to look.

The point of a second service console in not for network redundancy. It

is there in case you kill the first by some bad configuration -

particularly the routing or firewall components.

Vmotion occurs through the vmkernel ports. Routing vmk traffic is a

really bad idea as iSCSI/nfs and/or vmotion needs to be low latency. So

vmkport1 on esx1 should not be expected to connect to wmkport2 on esx2

(assuming you have a consistent naming convention). So why not use the

physical resources you would assign to a second vmk port to improve the

network redundancy of vmk port 1?

As long as you have a free vSwitch, you can keep creating vmkports or

service consoles. But why would you? Each adds an overhead to your esx

host. And you are not going to get any better redundancy from it, as you

are still relying on your physical switching infrastructure to carry the

communication to the other esx hosts in the cluster.

So configure a second service console on a separate broadcast domain.

That is a good plan. You could configure a second DAS if you want but I

would not in your environment. Your esx hosts have five addresses to

check for isolation already, adding another address will just slow the

HA process down further. To get better redundancy for your vmk ports,

plough your time and money into improving the infrastructure and adding

physical nics.

With four physical nics, I reckon you are better off grouping them as

follows.

vSwitch0 - 2 physical nics

Primary service console - mgmt vlan

VM Network - client access vlan

vSwitch1 - 2 physical nics

Backup service console - backup mgmt vlan

Vmk port - iSCSI/nfs/vmotion vlan

Given the choice I would get an extra couple of nics. I would use these

to move the VM network and client access vlan onto a separate vSwitch.

Hope that makes sense.

Azi

Rubeck
Virtuoso
Virtuoso

For the hosts not to show as disconnected the vCenter has to recieve UDP packets on port 902 originating from the ESX hosts SC connection...

All SC connections on a host shares the same routing table. The default route to your vCenter server is out using vswif0 (unless a vswif1 has been added later on to the same subnet as the vCenter server).

For this to change vswif0 would need to go "down".... Even then, no updating of routing table on host would happen.

1: The "das.isolationaddress2" if set to an address reachable by a secondary Service Console configured can be useful. It all depends on the setup and need..

2: You can only have one vMotion enabled vmkernel port per host configured. (At least, I think)

3: If you want to test HA out with multiple SC ports you can individually disable these using the esxcfg-vswif command and see the host and cluster responses generated in host and VC logs.

(Need access to console, iLO or similar)

4: Nope.. They're just two seperate IP enabled NICs assigned to a VM. In this case let's call the VM "Service Console" Smiley Wink

My to cents..

/Rubeck

Reply
0 Kudos
Azriphale
Enthusiast
Enthusiast

A further thoughts occur...

Your secondary service console (SSC) will never be used by virtual center regardless of where you put it or how you address it. VC will only use the hostname or ip address used to register the esx host. Ok, you could change the hostname in dns or hosts files but you will probably end up causing yourself a load of other issues - HA would probably have kittens. The SSC would be accessed by using ssh/telnet or by pointing the vi client directly at it. Bear in mind that the SSC probably won't be able to route traffic to you - I open an ssh session to a switch on the same broadcast domain as the SSC and use that to open another session out to the SSC.

However, the service console does get used for isolation testing by HA.

Azi

Reply
0 Kudos
francisbandi
Enthusiast
Enthusiast

Azi

This is very useful. I gave another 6 points..

This is what exactly we are trying to do..

2nd Service console on different broadcast doman(vlan)

we do not want to beat the purpose of HA by enabling an option to keep VM powered on even if that network is down.

- few more questions..

-- can we configure which cluster commication networks be primary in voling in heart beats and DAS pings

my configuration will be as follows

vswithc0 - 2 pNICS configured for broadcast domain / vlan say Vlan100

service console 0

VM Management

Vmotion - (private network)

DAS1

vSwitch1 - 2 pNics configured for 2nd broadcast deomain / vlan say Vlan200

service console 2

production network/or data network as we say

DAS2

in case of disster where we lost entire VLAN100 (it happens and happened even if we have fully redundant network all the way up to core)

I loose my DRS and vmotion functionality. We are OK.. to be with out this for temorary period..

-- so my question is ..can I able to configure.. which cluster communciation network will be primary

in my case above, I want my primary cluster communciation network to be VLAN200 and with DAS2

how can I do that in advanced configurations..

Reply
0 Kudos
Azriphale
Enthusiast
Enthusiast

Thanks for the points!

I think I see what you are trying to do. Am I correct in saying you want to get to a position where if you lose your vm client network on an esx host everything vmotions over to another esx so clients can continue connecting?

This is really outside of what VMware can achieve at the moment - you might want to look at the Fault Tolderance functionality that should be coming in vSphere (). I can't think of a way of getting this done inside esx or virtual center. ESX will automatically use any and all service consoles to determine whether it has become isolated. From memory, it also gets a bit annoyed if you muck up the service console on vSwitch0 - the only times I have seen this I ended up rebuilding the esx host as it was quicker than fixing all the issues it caused. I suppose you could configure all your vswitch0 service consoles to be on separate ip subnets so that they were only accessible from the VC - something with a 255.255.255.252 netmask for example. This would introduce a load of complexity though and you only introduce more capacity for errors when you introduce complexity.

You could go looking for a monitoring system that would monitor your esx hosts - you would probably need to run an ssh script or similar to move the VMs if there was an issue. Unfortunately DRS makes this complicated as you really have no idea where the VMs are at any point in time. I don't know of a system that would do this - I suspect someone out there does...

The one thing about your configuration that does concern me a little is that you say you have assigned the vlans to the physical nics. You would be much better off configuring the network ports as trunks and configure separate vlans on the separate port groups inside esx. Idealy, vm traffic should be partitioned from service console and vmk traffic.

I would really recommend looking at the setup of your systems outside of your vmware in the short term. In a correctly configured environment, your network shouldn't be a point of failure unless you have lost power and your ups batteries/generator fuel has run out - in which case your VMs not connecting is the least of your worries. Spend a few quid on a couple of Cisco 3750 switches, stack them together with stackwise and configure cross stack ether-channel for your uplinks to esx hosts and your network core - so each connection is replicated on the second switch. I use vlan interfaces to do the routing out of the switches so if one fails, the other takes over pretty much instantly. Power them off different circuits and only something pretty major will knock them over.

You could also try using MS clustering inside vmware but that's a whole different can of worms.

Azi

Reply
0 Kudos