VMware Communities > VMTN > VMware Infrastructure™ > VI: ESX 3.0 > Discussions

This Question is Possibly Answered

1 "correct" answer available (10 pts) 2 "helpful" answers available (6 pts)
7 Replies Last post: Feb 8, 2008 4:59 AM by chrisy
Reply

Building VMWare farm on NFS / iSCSI with HA

Feb 4, 2008 6:54 PM

Click to view Jeremey Wise's profile Novice Jeremey Wise 49 posts since
Oct 4, 2004
I have been speaking with a customer and they are looking at building a solution which will be running production class servers. Their assessing iSCSI, NFS, and Fibre Channel

I am trying to position Fibre Channel for the typical reasons of pipeline thoughput (4Gb FC HBA =320MB pipeline).


I have a few questions about NFS and iSCSI which I believe I am correct in my understanding, but need TECHNICAL validation.

1) In VMWare ESX 3.5 support of NFS or iSCSI the data passes though service console and though it uses a differnt mkernel module it relies on the Console interface. This Interface can not be made high available as it can not be teamed for path redundancy across multiple switches. Is this correct?

2) I was told their are data figures out their compairing iSCSI thoughput vs NFS and that NFS is prefered. As VMWare ESX writes data in 1MB blocks, what if any statisical issues would their be? (The only thing I can think of is that iSCSI has more overhead for byte ordering and rebuild and so would be less 'efficient' with the limitation constraint of Gb. Versus NFS which is tuned for more vaiable block and window size.)

3) With VMWare ESX non-support of TOE functionality, the pipeline of NFS and iSCSI is still limited by about 60MB sustained data throughput. Are their any plans where the interfaces though which the NFS and iSCSI (software initiator) sessions can be 'bonded'? And does this bonding (if it exist) negate HA of the interfaces? (aka bonding/teaming capabilitys are they base drivers which are mutually exclusive)


4) Are their any plans with ESX where a customer is able to leverage an iSCSI Hardware Initiator (QLogic iSCSI HBA) to avoid the thoughput constraints of TOE limitations and remove the CPU0 being pegged for TCP overhead, but still get HA? (aka install a pair of Q-Logic HBAs and get HA out of the design. As of now you bind LUNS directly in the iSCSI HBA and have to manually change the path on the adapter level in the event of path loss)


Thanks

Reply Re: Building VMWare farm on NFS / iSCSI with HA Feb 7, 2008 10:51 AM
Click to view bhadzik's profile Enthusiast bhadzik 53 posts since
Jan 19, 2006

1. The data does not flow through the service console. With iscsi, the initial scan of the iscsi target is with the service console, but all traffic flows through the vmkernel. You can have an additional service console and the vmkernel interface on the same vswitch. These interfaces can be made as highly available as all network interfaces.

2. iscsi is faster than nfs. Although this depends on the storage platform, the general assumption is that iscsi is faster.

3. See number one about the high availablity of the nics. You can setup bonding to provide larger throughput. Also with 10gig ethernet around the corner, iscsi will only get faster.

4. This has been supported since i believe 3.0.1. Check out the hcl

http://www.vmware.com/pdf/vi35_io_guide.pdf

qlogic 4050c and 4060 are supported.

Reply Re: Building VMWare farm on NFS / iSCSI with HA Feb 7, 2008 11:44 AM
in response to: bhadzik
Click to view Jeremey Wise's profile Novice Jeremey Wise 49 posts since
Oct 4, 2004

Thanks for your response. Just a few more clarifications please.

1. The data does not flow through the service console. With iscsi, the initial scan of the iscsi target is with the service console, but all traffic flows through the vmkernel.

---> I had heard that the console was the initiator and maybe that was the confusion. So you are saying that "a" console interface must be on the switch which also rides the iSCSI driver so that it can do the initial scan, but once the TCP session (authentication I assume also is in here as required) then any physical interfaces within that defined VSwitch can send and receive to that iSCSI target. Is this correct? Would that predicate that all MAC addresses would have to be added to the iSCSI host of allowable interfaces if MAC filtering is used?

2. iscsi is faster than nfs. Although this depends on the storage platform, the general assumption is that iscsi is faster.

This is good to know. I though it was odd that a higher level protocol would have efficiencies. Does VMWare have any data or recommendations to deal with the disparity of block size and thoughput ?

Example: 1MB block IO write that ESX forces when the data from the guest OS is 4k windows page file write is encapsulated into a iSCSI packet which holds 1524 bytes per packet for regular ethernet or 8192k per jumbo frame. This still means that each IO write from ESX over Ethernet requires 125 packets (1024000bytes / 8192 = 125). The result is a throughput limitation when (in this example) 124 total packets were sent as unnessisary in relation to the total IO write required if the block size from the OS would be maintained. For many smaller customer this may suffice but iSCSI is becoming more prevalent. Is their a way to tune VMFS to accomodate this (outside direct raw disk alocation to each VM)

3. .....You can setup bonding to provide larger throughput....

Where in ESX can you "bond" interfaces to provide an agregation of channels? You can add multipe physical NICS into a VSwitch and the kernel will 'pin' , in a round robin manner, a VM to an interface for load balancing (hense why beaconing is required when link state changes from upstream are not triggered, even though routing requirements for data access has to force a VM pin to be shifted). This also bets the question that if bonding can be done is this via some third party tool (Broadcom, or Intel) or as I would suspect by the vmkernel (in which case does this support CISCO ether-channel and by extention need to have assigned ether-channel IDs)

4. This has been supported since i believe 3.0.1. Check out the hcl....

The qestion was NOT if VMWare ESX supports iSCSI, but that the Hardware initator way of doing iSCSI (unless VMWare has some something special) is similar to every other OS in that it is a single point of failure. You go into the iSCSI adapter itself (during POST Ctrl+Q for Qlogic) and set the IP and target LUN for the iSCSI target. If this path failes then you have to have to set that target up on an alternate HBA, which though this can be done, requires MANUAL failover and so any guest OS IO though that HBA would be dropped. That is why customers use Software iSCSI stacks as you can bond a software iSCSI driver on top of an already HA provided 'teamed NIC'. Then if a single NIC fails, your data re-routing is provided by the lower level kernel linked team interface. How does VMWare initiae LUN path failover with Hardware initiator (which is as we know the only way to get TOE functions to break throughput beyond the ~60MBsec limit)

Thanks,


Reply Re: Building VMWare farm on NFS / iSCSI with HA Feb 7, 2008 11:51 AM
in response to: Jeremey Wise
Click to view mcowger's profile Virtuoso mcowger 1,784 posts since
Aug 22, 2007

Hi Jeremy,

1) yes, any physical interface on which the vmkernel rides may end up sending data (depending on your load balancing choices). Yes, you would need to add all those MAC addresses (but why would you - why not simply use proper authentication like IP address or CHAP?)

2) There are many users with decent NFS arrays that find them faster under NFS than iSCSI.

3) You perform this bonding in the vSwitch configuration. The bonding does NOT require switch help - it simply does ip or mac based hasing to assign traffic to a given physical interface. So, in certain cases (sustained connection on the same port between a VM and a given other machine, you will not exceed the speed of a single port).

4) I have no experience with the load balancing of hardware iSCSI, as we use fiber channel. Its simply a more mature protocol.

--Matt

Reply Re: Building VMWare farm on NFS / iSCSI with HA Feb 7, 2008 1:16 PM
in response to: mcowger
Click to view bhadzik's profile Enthusiast bhadzik 53 posts since
Jan 19, 2006

2. In my experiance this is a very big "it depends". So many different vendors implement iscsi and nfs in many different ways. It has a lot to do with the array itself, and how the vendor implemented iscsi support or nfs support.

3. To have proper link aggregation (Etherchannel from cisco) some switch configuration is required

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1001938&sliceId=2&docTypeID=DT_KB_1_1&dialogID=45605577&stateId=0%200%2045603357

If you don't do that, it still works, but the switch sees it as two seperate interfaces instead of a bonded set.

4. immature? tcp/ip is almost 30 years old :) ( I know, bad joke)

Reply Re: Building VMWare farm on NFS / iSCSI with HA Feb 7, 2008 1:19 PM
in response to: bhadzik
Click to view mcowger's profile Virtuoso mcowger 1,784 posts since
Aug 22, 2007

2. Agreed :)

3. Agreed here as well, as long as what you care about is overall throughput to a given VM. We dont, we care about averaging out link usage and failover, which you DONT need to setup etherchannels for.

4. Heh :)

--Matt

Reply Re: Building VMWare farm on NFS / iSCSI with HA Feb 7, 2008 1:34 PM
in response to: mcowger
Click to view Jeremey Wise's profile Novice Jeremey Wise 49 posts since
Oct 4, 2004

Thanks for the replies.

Good information and it helps me clarify what we can position with the new emergance of NFS and iSCSI technologies but.....

Question 4 is still not answered.... as I see it

1) If you use Hardware Initiator iSCSI HBAs, you have a single point of failure.

2) If you use Hardware Initiator iCSCI HBAs you get TOA functionality and by extention thoughput jump which will drive the guest per system significantly higher.

3) With software Initiator iSCSI driver you can get multiple physical interfaces bonded (agregation) , or teamed (high available) and so provide disk access path for buisness critical systems (unlike hardware initiators)

4) Software initiator iSCSI can not support TOE and so limits the total throughput to that of ~60MBsec per interface (which with a 1MB Block size IO write, is around ten IO Per second sustained per interface... about.... )

5) NFS is a higher level protocol and is software only. It is HA but is also constatrained by not only the TOE throughput of ~60MB second but by some additional overhead of it being a higher level protocol... though milage on actual data is contingient on the tuning of NFS on both sides and the disk delay of the target storage systems being tested against (Linux box running NFS v1 on two IDE = not so good :>)


Reply Re: Building VMWare farm on NFS / iSCSI with HA Feb 8, 2008 4:59 AM
Click to view chrisy's profile Enthusiast chrisy 57 posts since
Apr 27, 2005

There seems to be a misunderstanding about how iSCSI HBAs are used in ESX. Other than a couple of settings (KATO, jumbo frames etc) you don't need to set anything in the card BIOS. Say you have a dualport iSCSI Qlogic HBA. You can give it network settings using the VI Client, and tell it to rescan. It detects all the LUNs you've presented to ESX (using CHAP for authentication is probably the easiest). If one link fails, ESX knows the other one is a valid potential path and can fail over to it. There's support for the different behaviour of active/active and active/passive arrays. For example, on an EqualLogic array VMware knows any volume can be accessed on any controller. In that case you can ask the VI client to set one HBA port as active for some LUNs and the other HBA port as active for other LUNs thus giving a primitive form of manual load balancing. There's a round-robin setting now in ESX but it's experimental. Use of a HBA means the service console is not involved with ISCSI.

So, a dualport HBA gives you resilience and depending on the SAN, two active ports. It's possible to do even better, however.

As well as doing the above, connect a couple of ESX NICs to the storage switch. Make some LUNs for individual virtual machines, and have the SAN present them to the virtual machine NOT to the host. In the guest, use the Microsoft software initiator to pick up these volumes (using MPIO, of course). This gives you an easy way to get a share of 2G bandwidth + one HBA port bandwidth (for the boot drive) onto each virtual machine. It sounds complex but is not really so bad and is documented to work very well, for example a Microsoft ESRP document that certifies 60,000 exchange users hosted with that architecture. To clarify those new ports are not involving the ESX storage stack at all.

--

Chris


Actions