heikki_m
Contributor
Contributor

ESXi 3.5 management network very slow

Hello,

I'm having problems with ESXi (3.5 U2 latest, both embedded and installable) on three different hosts. Hardware is HP DL380 G5. Both NICs on every server are connected to 1000FDX ports without any duplex issues. ESXi network configuration is the default: both vmnic0 and vmnic1 are used for VM Network and Management Network. Switches show no errors on the ports.

VM Network is not showing any performance problems. I'm getting steady 30-40MB/s to and from guest machines.

Accessing the management network (copying to datastore, converter access, downloading VI client etc.) is painfully slow. Ranging between 100kB/s to 3MB/s - usually around 1MB/s.. needless to say that this is very frustrating when for example converting existing virtual machines to the ESXi hosts.

Any idea where to start looking for a solution?

Tags (3)
0 Kudos
107 Replies
dragin33
Contributor
Contributor

Update to perc3 card:

Nope I managed to get a Linux VM to load onto the server where I ran many IO tests for net/disk/memory performance. I didn't see any real problems from this server to any other. The only problem is that when I copy anything over the management NIC it get dog slow. This is still happening. I am still getting IO errors when trying to copy to the esxi server. My network team says there is nothing wrong on the network.

0 Kudos
KBuchanan
Enthusiast
Enthusiast

dragin33: Where are you seeing the IO errors? ...system log files?

I strongly doubt there is a network problem...just like I strongly doubt any VMWare people are actually reading this (...if so, why have they responded??). I do believer however, that the Management network is purposefully capped at some arbitrary limit imposed by VMWare.

After all - it is a management interface and ideally, you would have the imaged stored on a SAN and there are other means of backing up and protecting the images on the SAN. I'm not defending it...because I am a victim too. ...I'm just calling it like I see it.

Suggestion: Use the Veeam's FastSCP. The public beta release for 3.0 supports ESXi - and I am getting a sustained 30MBytes/sec with their software. With WinSCP I only get about 3MBytes/sec and PSCP.exe fails after 2-3 mins. You can find the download in the Forum.

0 Kudos
duonglt
Contributor
Contributor

Hmm, how about trying to use a Linux NFS server? Takes about 15 minutes to install it and get the NFS server up and running.

VMware ESX 3.x and ESXi Scripts & Resources: http://www.engr.ucsb.edu/~duonglt/vmware
0 Kudos
KBuchanan
Enthusiast
Enthusiast

I've tried NFS on Linux and Microsoft (SFU)...same performance. ALTHOUGH...one the image is there, it loads and runs quickly - and file copies to the VM (ie, the filesystem in the running VM) are reasonable and "as expected".

It's just that copying over the mangement interface really stinks.

0 Kudos
duonglt
Contributor
Contributor

What were your export settings for the Linux NFS server?

VMware ESX 3.x and ESXi Scripts & Resources: http://www.engr.ucsb.edu/~duonglt/vmware
0 Kudos
giulianozo
Contributor
Contributor

Hello,

excluding scp and nfs (and fastscp as it doesn't run on linx) what's to options to transfer files to esxi ?

I was thinking about using an external USB disks but it seems it's not suported. dvd are no option for large files

can you use a non management interface to transfer files ?

if not can you access the datastores from within a VM without using the management interface ?

other experiences with different transfers type ? what if I connect the esxi sever and the nfs server with a crossover cable (sorry I can't try this because I'm out of the office) ?

thanks

giuliano

0 Kudos
dragin33
Contributor
Contributor

I am seeing an actual error message pop up from the VMWare Management Console when I go into the datastore browser and try to copy a large file in or out. It just says IO Error. Sometimes the error comes imidiately some times it comes after a long time of copying (although slowly copying)

I would like to try fastscp but my network security peeps have it blocked.

0 Kudos
dragin33
Contributor
Contributor

I'm sorry to say I've just tried the veeam FastSCP and I'm not seeing any better speed. Smiley Sad

0 Kudos
patj3
Contributor
Contributor

Same here, I've got esx 3i installed with the CD on a Dell Poweredge 2950.

uploading to the datastore goes trough the management lan and seems to be capped at 2500KBps.

if I start a second upload with fastscp, the second upload uses about 4000KBps and looking at the performance monitor I can see the network utilization go up to 6500KBps.

If anyone can come up with a soltion that would be nice altough my 200GB VM is almost copied after 3 days Smiley Happy

edit: i'm running v 130755 atm

0 Kudos
koit
Contributor
Contributor

I had the same performance results as you, but between to ESXi servers.

I tried several weekends to move a large VM, but had to abort because of lack of time.

I tried, Converter, FastSCP, SCP and so on.

My solution was to install an eval of VC and import the two ESXi's into it.

Then I did a datastore copy (not storage migration) between the two.

My transfer speed went from about 6MB/s to 35-40MB/s

In my opinion this proves that the ESXi can perform well in the management interface, but it seems that it behaves different when VC is involved.

0 Kudos
TechFan
Contributor
Contributor

I wonder if it is really limited by the drivers in ESXi. . .also if

capped, it must be capped at a percentage or host CPU or something. .

.because I see varying speeds reported.

0 Kudos
Dave_Mishchenko
Immortal
Immortal

I've logged a support request about this (SR # 1155278901). In my lab I have ESXi (latest build) managed by VC 2.5 U2. I saw better speeds (2x) when the VI client was connected via VC then when it was connected directly to ESXi. Now in both cases, the VI client would connect directly to the ESXi host to transfer the files, but when the transfer was initiated from the VC connection it was much better. I saw the same thing with FastSCP 3.0 (beta).

Download

direct - 46 seconds - 595 MB file 13 MB/s

via VC - 21 second - 595 MB file 28 MB/s

Upload

direct - 648 seconds - 5367 MB file - 8.3 MB/s

via VC - 320 seconds - 5367 MB file - 16.8 MB/s

0 Kudos
gi-minni
Contributor
Contributor

I ran into the same issue and had a big problem cloning a typical image of 20GB into

several IBM BladeCenterH. I wrote an ash script using rsync & scp and cloning images

in cascade. I observed that cascading 3 hosts each time gave me the best performance.

My observed performance is approx. 30-40MB/s

I am still wondering that this thread has no comments from the vmware support people so far.

@Vmware: Is there someone that can clarify this issue and advice the community what is the best way to

import/copying/cloning images with ESXi?

0 Kudos
dragin33
Contributor
Contributor

Dave,

Any update on the ticket you opened? I recently updated to the 14129 version but I don't think there is much improvement.

0 Kudos
Dave_Mishchenko
Immortal
Immortal

They haven't been able to replicate it themselves. Have you seen the same problem?

0 Kudos
giulianozo
Contributor
Contributor

Same problem with the latest version Smiley Sad

0 Kudos
KBuchanan
Enthusiast
Enthusiast

Dave:

They haven't been able to duplicate this problem?? Are they smoking something or have they been reading the hundreds of threads all over the Internet that (seemingly) everyone has this SAME issue??

Oh well...I don't think they want to officially admit that they have engineered some type of "BW-cap" into the VMKernel interface. I could understand if they dedicate the management interface...BUT...if we add another VMKernel interface we should be able to configure the VMKernel interfaces as either a "MGT" interface or a "DATA" interface. It makes sense that the management interface isn't flooded with data traffic...BUT @VMWARE SHOULD GIVE US AN ALTERNATIVE!!

0 Kudos
KBuchanan
Enthusiast
Enthusiast

This is for benefit of someone at VMWare who has lost their way and found themselves in the middle of our discussion on the "Management Interface".

First, a description of the system:

ESXi 3.5.0 Build 123629

DL350 G5 with 2 Dual-Core 1.6GHz processors

32GB RAM

P400 Controller with 128MB Cache

Array #1: 3 76G 15k SAS Drives (Raid-5)

Array #2: 3 142G 15k SAS Drives (Raid-5)

6 Gig NICs

EMC Celerra NS20 NAS

146G 15k Drives configured as a 4+1 Performance array

We run six VMs locally on the local array #1. Nightly, we use a script to snap the images and then "hot clone" the image" to the 273G local array #2. We then copy the "cloned image" to the EMC NS20 NFS array and then we delete the snapshots. We keep two copies of the images on local storage and two copies on the EMC NS20 NFS array.

...and for the finale!

Size of VMs on Array #1: 131 GBytes

Time to clone to Array #2: 35.8 minutes

Speed MB/sec (local copy): 61.1 MB/sec

Time to clone to NS20 NFS Array: 41.5 minutes

Speed MB/sec (local copy): 52.7 MB/sec

So, there is only a about a 8.4Mb/sec difference between copying between the local disk volumes vs copying to the NFS share. Some one that is more of "hardware guru" may can explain or justify these numbers being realistic. It isn't bad...BUT...it isn't great either.

Cloning 131GByte of images in 35 minutes locally vs 41 minutes "over the wire" to NFS. Can someone validate that is or is not a reasonable level of performance?

0 Kudos
TechFan
Contributor
Contributor

That sounds like good results to me. Most people are complaining about

getting around 4MB/s. We can usually get 20MB/s to our NFS from our

ESXi server. . .we get 50MB/s from another NFS client.

0 Kudos
Tangata
Contributor
Contributor

These appear to be very good results you are reporting. I suspect the difference is in the fact that you are running a script, while I believe that most of us in this thread are attempting to use the Virtual Applicance Export function or the Datastore Browser to backup the VM to external storage.

I guess the bottom-line is, what utility are you using to actually clone the VM?

Lou

P.S. My test VM is approximately the same size, but it takes 5+ hours to complete using the Export function or Database Browser. And that's only if it doesn't terminate and die with an I/O error in the middle of the operation.

0 Kudos