VMware Cloud Community
smokeyrd
Contributor
Contributor

Slow write perf-only in vmware

I have a rack of dell 2950's that are all cooperating using ESXI5.1 connected to an ISCSI lun on another 2950 running FreeNAS with 6x2TB 7200RPM Seagates. I just added another 2950 with a LSI 9240 that breaks out to a SGI/Rackable SE3016 loaded with 16x73GB 15K SAS drives.

Drive/Network performance metrics:

Sample DD command:

dd if=/dev/zero of=/home/rob/ddfile bs=2048k count=10000  <--note that on internal copy jobs i used 10k count, anything over the network i used 1k count

Current Production NAS (ISCSI/SATA):

     Iperf: 920 Mb/s Avg on 4 threads

     DD 20 gig self-copy (unavailable due to device extent on iscsi share...no "internal" storage)--current peak drive in/out is 72/74MB/s, not representative of actual "max" but I am unable to spin down everything at this time so instead serves as an average rather than max capacity on the drives themselves.

New Nas

     Iperf: 920 Mb/s Avg on 4 threads

     DD 20 gig self-copy: 20971520000 bytes transferred in 36.391064 secs (576282135 bytes/sec)

DD from old device to mounted NFS share hosted from new device: 2097152000 bytes transferred in 30.661172 secs (68397647 bytes/sec)

Any operation within Vsphere results in similar speeds to this test DD from a VM hosted on the SAS drives: 20971520000 bytes (21 GB) copied, 2765.61 s, 7.6 MB/s

It appears that "something" is going wrong wherein operations on this single datastore are extremely slow but only when communicating with ESXI/vcenter. It doesnt appear to be an issue with protocols or hardware limitations/problems as outside VMware everything behaves properly. I have a case open with VMware tech support but wanted to also pose this scenario with the community and see if anyone else has experienced similar problems. If anyone has any ideas as to why this may be happening I'd be interested in trying out some scanerios!

0 Kudos
10 Replies
RParker
Immortal
Immortal

You have some throttling somewhere.. I suspect one or more interfaces on the ESX host isn't a 1G, it looks like a Fast Ethernet..

I would trace ALL network connections from ESX host, verify those NICs in ESX are showing 1G (or higher) connection, and that port on the actual switch is in fact a 1G or higher speed.

This appears to be a problem with your network setup, and not a problem with ESX.

We get good throughput, so it's difinately NOT ESX or VMware related.

I am sure VMware will tell you the same thing.

smokeyrd
Contributor
Contributor

Just checked over the network settings and the VM that is currently running on the datastore shows 1gig internally. All hosts report 1g full on all nics (each host has 4 nics. 2 broadcom onboard and an intel dual-port). Both datastores show 1g connect and test similarly on the DD runs. If i move a VM from the ISCSI/SATA datastore over to the SAS/NFS, it suffers the same issue as the vm that is currently on the datastore. Moving traffic back over to the ISCSI/SATA datastore from the SAS/NFS results in write speeds of 117MB/s. Something is "weird" about that to me. If I misinterpreted you response, please let me know and I will take another look/revealuate.

0 Kudos
RParker
Immortal
Immortal

NO, it appears everything is done correctly.  We have iSCSI and Fiber.  We have 2 separate SAN array, I can get speeds between 60 and 70 MB/s consistent.. and slightly higher on iSCSI...

My point is that VMware or ESX isn't the problem, but the fact that you reported 7.6 MB/s means that is a 100 MB network (I assume).  IF you were getting 20 MB/S that would mean it's using the 1G bandwidth, and then I would suspect drives or maybe something else, but the fact that is the same speed I would expect from a Fast Ethernet, is very suspicious.. there is a NIC (maybe the vSwitch on ESX itself is limiting) that isn't allowing the full bandwidth.

Did you enable jumbo frames for the ESX host and the port on the switches?  Maybe there is a problem with VLAN configuration.. something doesn't seem right with network setup....

I know you said you moved traffic, but what I am trying to acertain is the actual path from ESX all the way back to the NAS/SATA array may not the be the same path EXACTLY as another phyiscal host.  IF you unplug a NIC from the ESX host, and put that physical ethernet into a physical server so they both share the same exact topology, that will prove the network is fine.. but I feel as if there is a setting missing or some network that isn't set just right..

It only takes 1 little setting to mess it up...the chain is only as strong as the weakest link.

If you checked, verified, and it all looks good, then I don't know what else it can be.. but we can get really good speeds on VMware \ ESX, I know there is no problem there.

smokeyrd
Contributor
Contributor

Yeah, thats the confusing bit for me...downstream from the SAS/NFS is filling a 1g pipe to the brim but doing the same operation but upstream results in the 100mbit-like speed. Hopefully the VMware tech assigned to the case will get back to me this morning and we can get it taken care of. Ill certainly let you know what I find out as this doesnt seem to be an obvious config error (but may be).

You had mentioned testing the path speed and I would expect that if i were to migrate a machine to/from the datastore it would use the same path both ways. Doing this is the operation I was describing earlier where upstream (to the NAS) is very slow but downstream is filling the pipe.

I dont have Jumbo Frames turned on...I had flipped the switch on them once before but I work with another "tech" that had a heart attack because he doesnt use them at his work and thought jumbo frames would cause issues so in the interest of keeping the peace, Jumbo Frames is off.

All of the physical servers are hooked to a set of cisco 2950g with failover but no load balancing (again the other guy doesnt like (or maybe understand) the settings for the load balancing). I thought of doing DD from within the host itself o the NFS share but alas, invalid command. I tried moving the NAS from the primary to the secondary after your last message in an attempt at testing the physical path being inefficient scenario but it resulted in the same performance (if not slightly slower on the downstream due to overall network traffic and only having 1 gig link between the switches).

0 Kudos
RParker
Immortal
Immortal

smokeyrd wrote:

I dont have Jumbo Frames turned on...I had flipped the switch on them once before but I work with another "tech" that had a heart attack because he doesnt use them at his work and thought jumbo frames would cause issues so in the interest of keeping the peace, Jumbo Frames is off.

These are my favorite people Smiley Happy  The ones that panic when it's relatively easy to google jumbo frames to see what they do.  ALL they are for is MTU from 1500 to 9000 to allow increase block size for transmitting files TO NFS (biggest benefit).. People panic for nothing, when its clear they are GUESSING and don't take the time to actualy find out what it does.

Love those people.. it's pathetic and nerve wracking, monumentally annoying, but love em...

I surmise from your "keep the peace" comment you know what I am saying..

0 Kudos
smokeyrd
Contributor
Contributor

<?rant

Yeah...its annoying but I have to play nice because he has certs and I dont...I work on the network and I feel I know it better than him but hes allowed to program the switches and I'm not because he has the papers and I dont. Alas, such is life. Chances are I'll figure it out with vmware and while im at it ill change a few things (jumbo frames, vlans, and load balancing) while im in there...say that vmware did it to fix the NAS...hopefully they buy it. If something breaks later on Ill eat the blame but reference the increased speed and reliability up to that point. Its easy to setup and as you say, a googler can do it. Im still learning DB's and starting on perl/php/scala but switches/networks are cake compared to all that.

?>

0 Kudos
RParker
Immortal
Immortal

2 Pet peeves in a ROW!  You hit the nuisance lottery.

Another one of my "favorite" people is those with those mysterious "credentials".  In this day and age, it's easy to get certs.  It doesn't mean you ACTUALLY know anything all it means is you MEMORIZED for the test.

I can go on about this ALL week, but I will spare you.  Same goes with people with degrees from Harvard and "other" schools, like somehow my piddly degree from DeVry and 15 years of work experience means ZIP.

Yeah.. I need to stop now.

Take it all with a grain of salt, use Google to your advantage to prove him wrong (like now).  Once he sees you can basically discredit his "assumptions", he will back down and you will get more credibility.. You seem competent.. you just need to get on the job experience to prove your worth.. you are my kind of people.

0 Kudos
smokeyrd
Contributor
Contributor

lol, thanks for the pep talk. Im hoping that my lastest project of putting in a ticketing system (spiceworks) along with a system monitoring system (nagios) will help improve the perceived reliability of the network enough to get them to trust me a bit more...maybe then i will be able to convince them to send me to some official courses so i get get my certs and be deemed "worthy". Then again, we are a small business and times arent exactly the best they've ever been...they may just say to hell with the certs and realize that experience is better than certs...assuming nothing goes wrong with my setup...if something does ill be looking for a new job, lol.

0 Kudos
smokeyrd
Contributor
Contributor

Found the answer. Somehow, some way, vmware wanted jumbo frames and nothing else wanted jumbo and it made "stuff" angry. I thought everything (except the switch) would auto-negotiate and tune down the frames if it wasnt enabled but that apparently was incorrect. Hope this helps someone if anyone else runs across this. Sorry for hte long delay on the fill-in.

0 Kudos
mcowger
Immortal
Immortal

Glad you figured it out.

Its also worth mentioning that 'dd' is probably the single worst way to test drive performance, because its access pattern and write methods are not even close the how 99% of applications work.

--Matt VCDX #52 blog.cowger.us
0 Kudos