VMware Cloud Community
Macomar
Contributor
Contributor

What is wrong with my infrastructure ?

hello, to everybody.

currently I build up following infrastructure on vmware vsphere 5.1 Enterprise... the following is given:

hardware:

  • 7x hp DL 380 g7 server with 2 sockets cpu and in each case 192 GB ram.
  • every server is equipped with 12 network cards.
  • as storage is used a netapp 2240-2 with 2 controllers and 24 hard disks.
  • the netapp is equipped with a mezzanine card per controller. thereby 2x 10gb nics are available per controller.
  • the 10gb nics on the netapp are bundled up as a virtual interface on each controller.
  • used switches are cisco ws c3750x 24. these are also equipped with 10gb modules.
  • the cisco switch model is certificated from netapp. the cable connection from storage to switch is offered about netapp directly.

software:

  • each hp server is installed with vmware esxi 5.1 release 799733.
  • the esxi installation file is from hp with the drivers from the server compiled in it.
  • the netapp has the data ontap release 8.1.1P1 7-mode
  • we use software iscsi hba on the esx server

now the big problem:

  • i converted some physical servers through vmware converter to the the vm infrastructure.
  • some servers are running ms sql server und oracle on it.
  • the problem is now, that the storage i/o traffic over iscsi is now slower then before.
  • im posting screenshots from iometer in the posting.

virtual networking in vcenter is built up as follows:

  • four nics from a server in the vm cluster sends the storage traffic to the cisco switch.
  • the cisco sends the traffuc about his 10gb nic modules further to the netapp.
  • i've use the howtos to build iscsi multipathing von the vswtich, that sends the storage traffic to the netapp.
  • the virtual machine vmdk's are stored in one big aggregate on the netapp.
  • the aggregate is splitted in three volumes (one for the netapp wafl system and two for vm data).
  • the two data volumes has each one lun in it, where the vmdks are stored.

now i'm experimenting with the vmware best practice guide "oracle databases on vmware". interesting is, that my performance is the same bad on each server and virtual machine.

it's not the matter, if the virtual machine is with ms sql, oracle or just a windows 2008 server with nothing on it. iometer still brings bad results on the 4k blocks read and write.

now the question in here ist, what is wrong ?

many thanks for a solution

marc

Reply
0 Kudos
32 Replies
Macomar
Contributor
Contributor


ok. thank you very much for your advice. i will check the "red state Details" and will keep you up to date what it tells.

and please keep thinking what you did to mitigate the Problem Smiley Wink

i did a overflow over details status of the "red state" server in the vsc plugin. where or what do i have to search for ? there are no red markings,

that gives me a hint for an error.

Reply
0 Kudos
Macomar
Contributor
Contributor

the netapp is active-active configured.

Reply
0 Kudos
stainboy
Contributor
Contributor

hmmmm. I don't think that is a "real" active/active array. You have two contollers but I think each owns it's own LUN's... So when you created the LUN's you did it on each controller in separate right?

the 2040 worked that way and only high end array are tru active active like a vmax. So... I think you have ALUA but it's not avaliable for iscsi as far as I remember. Without it, the only way SP1 would pick up the LUNs on SP2 would be on a failover. If thats the case, make sure you divided the storage on the two controllers

Reply
0 Kudos
oompaloompa31
Contributor
Contributor

We have performance problems with a similar configuration:

DL380 G7's vSphere 4.1 U3

CN1000Q (10GB HW iscsi)

NetappFAS 2240's

HP A5800 as our storage switch for connectivity between VM hosts and Netapp storage

We've tried the 10GB converged adapters as HW iscsi, SW iscsi, and totally swapped out the the 10GB cards with 1GB HW iscsi cards.

We're in the process of replacing the A5800 with a different 10GB switch.

Which code version is your Netapp FAS2240?

We're at 8.1.1 7-mode for our production storage.

Have you been able to resolve your problem?

Thanks

Reply
0 Kudos
stuvstuv
Contributor
Contributor

Hi there,

I have similar problems, but am using IBM X3755 M3 4 socket servers, Citrix Xenserver, Cisco MDS 9148 and Netapp FAS 2240-2 in active active controller.

I have another Xenserver in the lab with SATA drives on a raid controller with BBWC and I have better performance there than on the netapp!

I have bad read performance in the virtual machine and on database cluster.

I have since discovered that the Netapp FAS 2240-2 does not have any cache. No read Cache, no write cache.  The models higher up, do.

I have decided to implement another shelf, extend the aggregate with the same disks, and put 4x200GB SSD disks for the flex pool.

Hopefully, this will help.

Regards

Stavros

Reply
0 Kudos
sharkspear
Contributor
Contributor

Stavros,

     I'm not sure where you got the information that the 2240 has no read or write cache, but the information is innacurate, maybe you got that from HP.

I can speak with some authority when I say that both the FAS2220 and the FAS2240  have 6GiB of DDR based RAM per controller most of which is used as a unified read-write. Of that memory about 800MiB is battery backed and mirrored to it's partner controller, and is used used to protect uncommitted write operations. After ONTAP operating system requirements are taken into account, on a dual controller system this gives you around 8-10GiB of cache memory for your workloads, which, given ONTAP's architecture, is usually more than enough to extract maximum performance from the relatively small number of disk spindles usually assigned to those controllers.

Depending on your workload, you may get better performance out of a similar number of SATA spindles which are direct attached to a machine vs being attached via some form of network. e.g. single threaded large block sequential workloads usually benefit from the lower latency provided by a PCIe attached raid controller. On the other hand virtualisation workloads are very rarely characterised by that kind of workload. Virtualisation workloads typically involve a large number of simultaneous random reads and writes. As a result the optimisations that are possible from an intelligent controller in a spindle constrained environment usually provide a significant performance benefit.

While I don't have the bandwidth to help you troubleshoot your issue, there are a number of free tools available from NetApp such as OnCommand Performance Manager that will help you isolate where the performance problem might be, and you can always open a performance troubleshooting ticket with NetApp support.

Regards

John Martin

Principal Technologist - NetApp ANZ.

Reply
0 Kudos
mikeyb79
Enthusiast
Enthusiast

If you have the 10GbE NICs bonded as an IFGRP or VIF or whatever they call them these days, unbond them. IFGRP plus iSCSI is a no-no. Bind both 10GbE NICs to the iSCSI service on each of the NetApp controllers individually and set them for RR multipathing over jumbo frames (enabled end-to-end) and it'll work properly. At least the pathing will.

Reply
0 Kudos
Macomar
Contributor
Contributor

Hi,

why is it a "no-no" to bind the 10GbE NICs to an VIF IFGRP on the netapp filer with iSCSI ? Thats the configuration i have in the moment.

Reply
0 Kudos
mikeyb79
Enthusiast
Enthusiast

Each individual network link is a path to the LUN. You can't aggregate them and then try to multipath over them. Break the IFGRP and bind each interface to iSCSI individually. I had a 2240-4 in a previous job and we were getting ~2800 IOPS on the random 8k test with 16 1TB SATA in an aggregate when configured properly (binding 2 of the interfaces to the iSCSI service with no IFGRP).

Reply
0 Kudos
Macomar
Contributor
Contributor

Hi,


here a screenshot from my config

netapp_vif_1.jpg

and here the iSCSI binding of the LUN in VMware

netapp_vif_2.jpg

what configuration would you do ? can you please describe it for me ?

thanks

macomar

Reply
0 Kudos
Mike_b79
Contributor
Contributor

Remove the VIF on the NetApp side and remove the port channel on the switch side. Do not aggregate links with iSCSI.

Reply
0 Kudos
stuvstuv
Contributor
Contributor

Hi,

I am designing a new systems wit 10Gbps link to two switches, for an iscsi datastore on VMware.  Should these links not be aggregated into a port channel, but rather active/passive for redundancy?  Can anybody explain to me why port channel is a bad idea?

stuv

Reply
0 Kudos
Macomar
Contributor
Contributor

hi stuvstuv, thats an interesting question. i would like to have an explanation too Smiley Happy

Reply
0 Kudos