hello, to everybody.
currently I build up following infrastructure on vmware vsphere 5.1 Enterprise... the following is given:
hardware:
software:
now the big problem:
virtual networking in vcenter is built up as follows:
now i'm experimenting with the vmware best practice guide "oracle databases on vmware". interesting is, that my performance is the same bad on each server and virtual machine.
it's not the matter, if the virtual machine is with ms sql, oracle or just a windows 2008 server with nothing on it. iometer still brings bad results on the 4k blocks read and write.
now the question in here ist, what is wrong ?
many thanks for a solution
marc
ok. thank you very much for your advice. i will check the "red state Details" and will keep you up to date what it tells.
and please keep thinking what you did to mitigate the Problem ![]()
i did a overflow over details status of the "red state" server in the vsc plugin. where or what do i have to search for ? there are no red markings,
that gives me a hint for an error.
the netapp is active-active configured.
hmmmm. I don't think that is a "real" active/active array. You have two contollers but I think each owns it's own LUN's... So when you created the LUN's you did it on each controller in separate right?
the 2040 worked that way and only high end array are tru active active like a vmax. So... I think you have ALUA but it's not avaliable for iscsi as far as I remember. Without it, the only way SP1 would pick up the LUNs on SP2 would be on a failover. If thats the case, make sure you divided the storage on the two controllers
We have performance problems with a similar configuration:
DL380 G7's vSphere 4.1 U3
CN1000Q (10GB HW iscsi)
NetappFAS 2240's
HP A5800 as our storage switch for connectivity between VM hosts and Netapp storage
We've tried the 10GB converged adapters as HW iscsi, SW iscsi, and totally swapped out the the 10GB cards with 1GB HW iscsi cards.
We're in the process of replacing the A5800 with a different 10GB switch.
Which code version is your Netapp FAS2240?
We're at 8.1.1 7-mode for our production storage.
Have you been able to resolve your problem?
Thanks
Hi there,
I have similar problems, but am using IBM X3755 M3 4 socket servers, Citrix Xenserver, Cisco MDS 9148 and Netapp FAS 2240-2 in active active controller.
I have another Xenserver in the lab with SATA drives on a raid controller with BBWC and I have better performance there than on the netapp!
I have bad read performance in the virtual machine and on database cluster.
I have since discovered that the Netapp FAS 2240-2 does not have any cache. No read Cache, no write cache. The models higher up, do.
I have decided to implement another shelf, extend the aggregate with the same disks, and put 4x200GB SSD disks for the flex pool.
Hopefully, this will help.
Regards
Stavros
Stavros,
I'm not sure where you got the information that the 2240 has no read or write cache, but the information is innacurate, maybe you got that from HP.
I can speak with some authority when I say that both the FAS2220 and the FAS2240 have 6GiB of DDR based RAM per controller most of which is used as a unified read-write. Of that memory about 800MiB is battery backed and mirrored to it's partner controller, and is used used to protect uncommitted write operations. After ONTAP operating system requirements are taken into account, on a dual controller system this gives you around 8-10GiB of cache memory for your workloads, which, given ONTAP's architecture, is usually more than enough to extract maximum performance from the relatively small number of disk spindles usually assigned to those controllers.
Depending on your workload, you may get better performance out of a similar number of SATA spindles which are direct attached to a machine vs being attached via some form of network. e.g. single threaded large block sequential workloads usually benefit from the lower latency provided by a PCIe attached raid controller. On the other hand virtualisation workloads are very rarely characterised by that kind of workload. Virtualisation workloads typically involve a large number of simultaneous random reads and writes. As a result the optimisations that are possible from an intelligent controller in a spindle constrained environment usually provide a significant performance benefit.
While I don't have the bandwidth to help you troubleshoot your issue, there are a number of free tools available from NetApp such as OnCommand Performance Manager that will help you isolate where the performance problem might be, and you can always open a performance troubleshooting ticket with NetApp support.
Regards
John Martin
Principal Technologist - NetApp ANZ.
If you have the 10GbE NICs bonded as an IFGRP or VIF or whatever they call them these days, unbond them. IFGRP plus iSCSI is a no-no. Bind both 10GbE NICs to the iSCSI service on each of the NetApp controllers individually and set them for RR multipathing over jumbo frames (enabled end-to-end) and it'll work properly. At least the pathing will.
Hi,
why is it a "no-no" to bind the 10GbE NICs to an VIF IFGRP on the netapp filer with iSCSI ? Thats the configuration i have in the moment.
Each individual network link is a path to the LUN. You can't aggregate them and then try to multipath over them. Break the IFGRP and bind each interface to iSCSI individually. I had a 2240-4 in a previous job and we were getting ~2800 IOPS on the random 8k test with 16 1TB SATA in an aggregate when configured properly (binding 2 of the interfaces to the iSCSI service with no IFGRP).
Hi,
here a screenshot from my config
and here the iSCSI binding of the LUN in VMware
what configuration would you do ? can you please describe it for me ?
thanks
macomar
Remove the VIF on the NetApp side and remove the port channel on the switch side. Do not aggregate links with iSCSI.
Hi,
I am designing a new systems wit 10Gbps link to two switches, for an iscsi datastore on VMware. Should these links not be aggregated into a port channel, but rather active/passive for redundancy? Can anybody explain to me why port channel is a bad idea?
stuv
hi stuvstuv, thats an interesting question. i would like to have an explanation too ![]()
