VMware Cloud Community
mldmld
Enthusiast
Enthusiast

ESX over NFS or iSCSI

Hi,

I read that iscsi is used by many people with esx server and the performance is not so bad with "real" life.

I would like to know why not consider nfs for production environment.

Here my thoughts :

With iSCSI

- ESX must serialize every IO on a VMFS lun.

- Iscsi can use only one Ethernet link at a time. So the limit is 1 Gb/s

With NFS

- ESX can access to all VM concurently

- One can use a VIF (agregate of Ethernet to get more than 1Gb/s)

-Backup a VM is like backuping files. Very easy with NetApp snaphots for instance.

- Snapmirror of VM on a DR site.

So my interrogations are on perfs issue ?

Does anyone have tested NFS on production environnement ? How many VM ?

Thanks

ML

Tags (3)
Reply
0 Kudos
23 Replies
dalepa
Enthusiast
Enthusiast

We run over 950VMs across 35 ESX hosts over NFS. We use snapshots and snapmirror for DR.

While all NFS servers are not created equal, NFS using Netapp has several vmware benifits(including performance) over FC/ISCSI.

More on the subject here:

dlp

WillemB
Enthusiast
Enthusiast

We have +/-100 hosts on 10 hosts using iSCSI and it performs nicely.

We even have all vlans on the same cisco switches which are also connected to our datacenters with physical servers.

So no need to be scared of such a solution 1Gbps is fair ammount of bandwidth.

To get out of the 1Gb limit you can team nics for redundancy and load balancing. VM's will be able to use a different path to the switch

and thus provide up to 2Gbps througput.

I would advise looking at the disc-IO's before choosing a solution. If it will work, depends on how you currently use your infrastructure.

For example a numbercrunching farm will react differently than a fileserver or a server farm with application hosting (1app per server).

My example is a reference environment with application servers and infra servers such as domain controllers (see 1:10 ratio).

Depending on your budget I would always advise a fiber SAN unless somebody else is willing to take the risk :-).

I have never tried NFS but I assume it will perform the same or less since it has a filesystem on which the files are projected.

Some NAS devices give you iSCSI and NFS possibility so give it a ROARrrr!!1

Reply
0 Kudos
mldmld
Enthusiast
Enthusiast

Thanks for theses info.

In iSCSI, ok one can use 2 links for redundancy, no multipath according to Server configuration guide.

For NFS, which NFS exports option do you use ?

Are special parameters on ESX or NetApp ?

Do you use jumboframes for instance ?

Thanks

Reply
0 Kudos
dalepa
Enthusiast
Enthusiast

Anonymous User ID=0

Security (sys)

Read-Write (VMkernel IPs)

We also use "Actual Path" on our snapmirror destination exported Qtree to keep the exported path the same as the primary in the event of a failover. We use ASIS on the secondary and get 60%+ dedup rate.

no jumbo frames..

Out ESX host have 2 vswitches with 2 eth port each, one vswitch for the guest and one for vmkernel+service console. We also do vlan tagging...

picture is better than words...

Reply
0 Kudos
mldmld
Enthusiast
Enthusiast

Why don't you use ASIS on primary ? It's supported. Any issues ?

No Vmotion is used ? Is it related to NFS ?

Thanks

Reply
0 Kudos
dalepa
Enthusiast
Enthusiast

We plan to use ASIS on the primary someday, however currently we have plenty of space. If we had ASIS a year ago, we would have purchased 1/2 the storage. Using ASIS on the primary makes VSM setup very easy, plus you have the snapshots at two locations...

"No vmotion"? We use Vmotion daily...

dlp

Reply
0 Kudos
mldmld
Enthusiast
Enthusiast

Very interesting !

How do you connect your ESX with the NetApp box ?

How many VM / Volume

How many Volume / aggregate

How many (FC?)disks / aggregate

How many Volume / FAS ?

How many Ethernet links ? In active/active or active/passive mode ?

We plan to use blade servers, so we plan to share 2 ethernet links for 8 ESX servers : 1 link for prod network and 1 link for backup network (used for NFS ESX access in this case)

And, how do you backup the VM ?

Do you synchronise netapp snapshots with ESX ?

vmware-cmd <VM path> createsnapshot backup NetApp quiesce
take the snapshot on the netapp
vmware-cmd <VM path> remove snapshots

Or in fact, it's no use ? You only take snapshots with netapp without syncing with ESX ?

Anyone else use NFS in prod environment ?

Best regards

ML

Reply
0 Kudos
admin
Immortal
Immortal

mldmld states:

  • With iSCSI

    • * ESX must serialize every IO on a VMFS lun.

This is not true. VMFS I/O goes to a SCSI device and there is no issue of serialization. Multiple, possibly hundreds, of I/Os could be in flight to a VMFS LUN from multiple VMs all at the same time. Perhaps you meant something else?

Irfan

Reply
0 Kudos
urgrue
Contributor
Contributor

We've been using iscsi for about 6 months. Performance has been downright abysmal. Deployment takes 2 hours (where it takes <10 minutes on traditional FC SAN). At some point, under heavy load, we experience things like LUNs jamming, partitions self-destructing, and performance slowing down to a crraaaawwwlll. I mean literally, operations like mkfs, which normally occur in seconds or minutes, can literally take hours.

I have never succeeded in finding the reason for this despite help from NetApp and vmware. We've gone over all aspects, network, disk loads, the load on the VMs, and nowhere is anything particularly heavily loaded. We've upgraded and downgraded and, well, tried everything we can think of.

Everything worked decently (though still much more slowly than expected) until we started having >80 or so VMs, at which point it all went to hell. This is all using the software initiator over dedicated NICs+networks against a NetApp.

Anyway, we gave up, and are in the process of migrating to traditional SAN and on the side I'm testing NFS, which hopefully will work out, as it would have a great many benefits over SAN.

I know iSCSI has worked wonderfully for some people, and I wish I was one of them, but just so you know, not everyone is.

Reply
0 Kudos
dalepa
Enthusiast
Enthusiast

I've have not tested ISCSI, mainly due to the great performance over NFS on netapp... I would be very interested in hearing how your VMs perform over NFS vs ISCSI.

Do you have any idea how much overall data your 80 VM generate? Alsy what type of VMs? Exchange, SQL, Apps, web, etc....

Reply
0 Kudos
mldmld
Enthusiast
Enthusiast

Hi

Dalepa, could you answer to my post 7. It may help to avoid bottleneck somewhere.

In my company, we installed Oracle databases on NFS and it works great ! So NFS for Vmware : why not ?

Thanks

ML

Reply
0 Kudos
dalepa
Enthusiast
Enthusiast

Very interesting !

How do you connect your ESX with the NetApp box ?

We have 2 6509's The Netapp has 8 ethernet gig ports. 2 vifs of 4 ports

How many VM / Volume

as many VMs that we can fit into 4TB volumes ASIS with 2-3x that number

How many Volume / aggregate

(2)16TB aggregate (2)4TB each aggregagte

How many (FC?)disks / aggregate

All FC disks, 40 disks/aggregate

How many Volume / FAS ?

4

How many Ethernet links ? In active/active or active/passive mode ?

4 active/ 3 passive per head

We plan to use blade servers, so we plan to share 2 ethernet links for 8 ESX servers : 1 link for prod network and 1 link for backup network (used for NFS ESX access in this case)

I'm not a fan of using blades for ESX hosts, but that's another issue...

And, how do you backup the VM ?

We snapmirror to a Netapp 200 and a snapshot a day for 21 days

Do you synchronise netapp snapshots with ESX ?

no

vmware-cmd &lt;VM path&gt; createsnapshot backup NetApp quiesce
take the snapshot on the netapp
vmware-cmd &lt;VM path&gt; remove snapshots

Or in fact, it's no use ? You only take snapshots with netapp without syncing with ESX ?

THat correct... Our VM are basically in crash recoverable (just like to pull the plug on power) Anyone else use NFS in prod environment ?

Best regards

ML

Reply
0 Kudos
BigHug
Enthusiast
Enthusiast

Hi, ML:

Are all 950 VMs happy with crash-consistent snapshots? Not an app or WIndows fails to recover? We like NetApp's snapshot. However, we are not quite comfortable with just crash-consistent.

Thanks for any inputs.

Reply
0 Kudos
dalepa
Enthusiast
Enthusiast

Over the last year we have restored at least 100 VMs... No problems.

Crash recovery of database VMs may be a concern, however we currently don't run databases on VM.

Snapmanager for vmware is in the works, so that may resolve this issue for databases...

In reality, the only reason to go back to a snapshot is because something major occured on the VM. And if your VM can't handle a power outage, I would think you have larger problems...

Also we have 22 days worth of snapshots to choose from... How many snapshots do you have on your SAN? Most SANs you are restoring from tape, and we all know how well that works.

Hi, ML:

Are all 950 VMs happy with crash-consistent snapshots? Not an app or WIndows fails to recover? We like NetApp's snapshot. However, we are not quite comfortable with just crash-consistent.

Thanks for any inputs.

Reply
0 Kudos
mldmld
Enthusiast
Enthusiast

Thanks Dalepa,

6509 is a Cisco switch ?

Which FAS do you use ?

So you have 2 vif of 4 ports. Do you mean your filer has a 2 levels VIF ?

I mean you created 2 multi VIF of 4 ethernet port, 4 ethernet ports connected to each Cisco switches

Then you agregate it in one Vif in single mode ?

But you saytoo you configure 4 active / 3 passive / head ?

On the other hand, what is the network configuration on the ESX side ?

I'm very surprised by the number of disks you put online for an agregate. With such many disk, the IO throughput is the higher one can get. But is this agreate reliable ? I mean, with so many disks, did you experience volume crash, even with RAID DP feature ? How long theses agregate exists ?

In my company, we create agregate of 13 disks. Do you think it could be cause a performance issue for ESX/NFS ?

Sorry to ask so many questions, but I would like to avoid pitfalls !

Best regards

ML

Reply
0 Kudos
dalepa
Enthusiast
Enthusiast

Thanks Dalepa,

6509 is a Cisco switch ? yep

Which FAS do you use ? fas3070 R200 and fas3050

So you have 2 vif of 4 ports. Do you mean your filer has a 2 levels VIF ? yep a trunk and two vifs

I mean you created 2 multi VIF of 4 ethernet port, 4 ethernet ports connected to each Cisco switches yep

Then you agregate it in one Vif in single mode ? yep

But you saytoo you configure 4 active / 3 passive / head ? one vif is standby, the other is prefered

On the other hand, what is the network configuration on the ESX side ? see photo above

I'm very surprised by the number of disks you put online for an agregate. With such many disk, the IO throughput is the higher one can get. But is this agreate reliable ? I mean, with so many disks, did you experience volume crash, even with RAID DP feature ? How long theses agregate exists ?

we use a raid size of 28. we have 24 clustered filers across north america 0 data loss over 10 years.

In my company, we create agregate of 13 disks. Do you think it could be cause a performance issue for ESX/NFS ?

yes maybe. the more disk you have the better performance.

Sorry to ask so many questions, but I would like to avoid pitfalls !

We have done several conference calls hosted by Netapp with several fortune 500 companies on this topic...

Send a private email here and I'll put you in contact with our Netapp Rep if you are interested.

Best regards

ML

mldmld
Enthusiast
Enthusiast

Hi,

I've just read http://www.netapp.com/go/techontap/matl/downloads/NAS-presentation.pdf . I joined to my entry.

They state to take care about NFS and single link usage on slide 13.

It is said : "NFS traffic uses singleVMKernel IP address per vSwitch / set of team NICsThis means that a single uplinkwill be selected for ALL traffic with:

•Load balancing based only on source IP/MAC/Port

•Load balancing with source IP and a SINGLE DESTINATION IP

•Single datastore"

Does it means that NFS traffic will only uses one uplink whatever the configuration selected on the NIC teaming on the ESX server Storage Vmkernel; if only one IP adress is used on the storage side, even with a VIF ?

To avoid this : does it means that I should

- on the ESX side, select a Load Balancing policy and the same on the Cisco switches

- on the NetApp side, declare the same number of IP adresses as the number of active-active trunking NICs. So 2 IP adresses for 2 x 2 NICs in a two level VIF like in the document 's example.

- Create a DNS entry for each IP adresses.(like nfs1.toaster.company.com, nfs2.toaster.company.com)

- Then when connecting the datastores on an ESX, I should use the nfs1.toaster.company.com name and nfs2.toaster.company.com alternatively ?

- Spread VM on all volume

Thanks

ML

Reply
0 Kudos
WillemB
Enthusiast
Enthusiast

Urgrue, I've noticed you have extreme problems on iSCSI which is strange since all is well at my company (&gt;100 VM's and growing without problems). Sounds to me like a network communications problem. I can think of a few possibilities (probably won't be any of them but we're here to 'try' and help Smiley Wink).

1. It sounds like a flopping duplex setting. You've probably checked it out but if not set it all to fixed 1000Mbps speed settings. Cisco switches are known to do flip-flops on auto-negotiation.

2. I've heard people having different MTU settings between iSCSI NAS and the host.

3. Latency could be too big for iSCSI to handle. Can result in unexplainable slowdown.

4. I've heard of people experiencing unexplainable slowdowns with iSCSI. Somebody noted that for them the cause was due to a slow part of network. Both sides we're connected at 1000Mbps but somewhere in the middle there was a piece of network machinery running at 100Mbps.

The limiting 100Mbps piece of network wasn't a bandwidth bottleneck but the speed transaction was very problematic for the iSCSI connection.

To continue on the discussion here's someone who has tested FC,iSCSI and NFS. There are positive comments towards both iSCSI and NFS. The ending argument is very interesting.

Quote "Hi,I made last year a test to compare FC, iSCSI and NFS using iozone on a physical Linux RH 4.

The storage was a NetApp FAS3020, and the server a HP BL20p.

FC & iSCSI where very close

NFS was really slower."

And another quote "FC is always faster with 1 or 2 ESX hosts...however the more ESX host you add the faster NFS performs. This is because of FC SCSI reservations in ESX. Ideally only one host can read or write to a LUN at a time with FCP. With NFS this is not a limitation. Hence the more hosts, the better performace on NFS then presenting a number of LUNs to several hosts on FC...."

Reply
0 Kudos
admin
Immortal
Immortal

Dalepa,

I am looking for reference companies using VMware over NFS to NetApp storage. Is it OK if I contact you offline ?

Reply
0 Kudos