VMware Cloud Community
TravisT81
Contributor
Contributor

ESXi Architecture Optimization

Hi all,

I've been running ESXi for about a year now, and I'm looking to optimize things a bit (with my whole network, not just ESXi).  I am running ESXi 4.1.0 on a home built box using a AMD Phenom II x4 945 Processor and 8GB RAM.  I am currently running 4 VMs actively (and occasionally test another here and there).

This is set up on my home network, and I use it for a mix of production (personal web services at home) and test lab (to implement and try new technology).  It services a handful of computers and two users (my wife and I).  The VMs do everything they are supposed to do, but I would like to increase performance and understand better how to optimize a virtual infrastructure.  When connecting the the VMs, they are very slow (via windows RDC) and would like to speed this up.

I have attached my current visio drawing of my network.  This is a first draft of the diagram, and I intend to improve on it as I go, but should give a pretty good idea of what I have set up.  If there is any pertinent information that I left off, please let me know so I can have a good working diagram.

I think much of my problem may be related to my NAS, which I will be optimizing as well (and welcome any advice on that from you guys/gals).  I'm running an OpenSolaris build (EON Storage) also on home built hardware.  Currently, I have a single disk zfs dataset that I'm storing my vmdk files on, which are mapped via NFS to the VM host.  I suspect that most of my performance issues are related to this.

The network infrastructure is all Gigabit using a Cisco 3750 swtich.  See diagram for more details.

I'll be digging through the forums and any documentation I can find for best practices, but if anyone would be so kind to make suggestions, I would appreciate it!  Thanks.

**Note:  PDF Attached is my visio diagram.  There are currently two pages, so make sure to scroll to the second for VM Network configuration.

0 Kudos
19 Replies
Dave_Mishchenko
Immortal
Immortal

One question about the diagram - where is vmnic0?

What kind of NICS are you using?   I would take a look at using esxtop to see if you're experiencing storage issues.  Here are a couple of docs to get you started -

http://kb.vmware.com/kb/1008205

http://communities.vmware.com/docs/DOC-9279

How is the memory and CPU for the host?  When you're using esxtop,  take a look at the CPU counter %WAIT.  That gives an indication if the VM is waiting for host resources.  Take a look at the section in the 2nd link that addresses %WAIT - %IDLE.

0 Kudos
TravisT81
Contributor
Contributor

Dave,

Thanks for the reply.  VMNIC0 is the onboard NIC, and due to problems I had with my file server onboard NIC, I installed an Intel E1000 based NIC.

I quickly ran esxtop, which shows some high numbers for %wait (anywhere from 5800 [helper] to 99 for several services).  My vm's are showing about 400 under this column.  Memory is pretty taxed, but not completely utilized.  I added an additional 4GB stick (going from 4GB to 8GB total) which allowed me to dedicate more resources to each VM and helped ever so slightly, but not where it should be IMO.  CPU utilization is typically real low.  Right now I'm at 224MHZ and 7047MB as reported in VSphere Client.

I'll check out the links more, but wanted to answer your questions.  I don't doubt that my NAS is causing some of the slow-downs, and am considering different options for open source NAS solutions.  My samba shares shared to domain users are suffering in speed as well, and they are on a raidz zfs dataset spanning 4 1TB disks (which should outperform the single disk setup).  Again, recommendations here are welcomed. 

Thanks!

0 Kudos
Dave_Mishchenko
Immortal
Immortal

How much memory is assigned to the VMs?

0 Kudos
TravisT81
Contributor
Contributor

I've listed VM Host/Guest memory assignments on the first page of my network diagram.

They are as follows:

Host - 8 GB

Web Services - 1024MB

DC1 - 1024MB

DC2 - 2048MB

Mail - 1024MB

I've also realized that my diagram is not exactly accurate.  It shows two 802.1Q trunks from my gigabit switch to my ESXi server.  There is currently only one, since I'm not using the onboard gigabit NIC.

0 Kudos
TravisT81
Contributor
Contributor

Dave,

I've read through the links you provided quickly, and it looks like there's a wealth of info there.  A little overwhelming.  I'm really not sure where to start.  I've looked at esxtop and used the "u" command to see per lun drive mappings.  My vm storage drive shows a relatively low CMDs (~ 10 - 20) when I'm not accessing the machines.  As soon as I remote into a machine, it shoots up to around the 150 - 200 range.  I'm truly not sure of the IOPS capabilities of my drive, but I would imagine that is probably reaching the max for a single disk.

The other thing I don't quite understand is what the %WAIT column indicates.  Not sure how the %'s add up either.  Almost %6000 for the helper process?  I don't get it.  Are my numbers a concern (see attachment)

0 Kudos
mcowger
Immortal
Immortal

%WAIT isn't that useful of a metric - it includes time the world was waiting for CPU time (blocked) as well as time that the world had nothing it needed to schedule (idle).

%RDY is better to look at, and in your case is basically 0, so you aren't CPU bound at least.

--Matt VCDX #52 blog.cowger.us
0 Kudos
Dave_Mishchenko
Immortal
Immortal

What values do you have in the Idle column?

0 Kudos
TravisT81
Contributor
Contributor

Dave,

When I run esxtop, I don't see an idle column.  I do have an idle process that is running.  See screenshot three posts up.  Is that what you're referring to?

Message was edited by: TravisT81

0 Kudos
Dave_Mishchenko
Immortal
Immortal

If you expand the Putty screen size you should then be able to view the column.

0 Kudos
TravisT81
Contributor
Contributor

Good catch.  It didn't look like anything was getting cut off, so I would have never even tried that.  Attached are updated esxtop screenshots.

0 Kudos
rickardnobel
Champion
Champion

TravisT81 wrote:

Good catch.  It didn't look like anything was getting cut off, so I would have never even tried that.  Attached are updated esxtop screenshots.

When using the CPU "c" view or the MEMORY "m", press "V" (capital) to just see the virtual machines, gives you a better view. Could you post the memory screen?

For the disks, it seems like very high latency values for the NFS-software datastore. What files are located there? VMs or just ISOs or similar?

My VMware blog: www.rickardnobel.se
0 Kudos
TravisT81
Contributor
Contributor

Sorry for the delay in getting back to you.  I've been out of town and unable to post up the screenshots you requested.

I attached a screenshot of the esxtop memory screen (all and VMs only using the "V" key).

The Software datastore only contains my iso files.  VMs are stored on the datastore named "VMStore"

I'm having problems with my file server, which leads me to believe that is the main cause of my problems.  I am looking for a replacement software to run on it, but can't find a real good solution.  I'd like to stick with something ZFS capable.  I'm considering using FreeNAS, but it still has some Active Directory integration issues that are important to me.  If anyone can suggest a user friendly solution that integrates nicely into Active Directory and will work well for ESXi I'd appreciate it.

Also, I've researched building a virtual NAS through VMWare.  I'm still not real smart on that concept, but is that something I should look into further or are there performance issues with that?

0 Kudos
rickardnobel
Champion
Champion

TravisT81 wrote:

Also, I've researched building a virtual NAS through VMWare.  I'm still not real smart on that concept, but is that something I should look into further or are there performance issues with that?

Here is an article about building a virtual iSCSI target server, which is great for testing and educational purposes.

http://www.rickardnobel.se/archives/26

My VMware blog: www.rickardnobel.se
0 Kudos
TravisT81
Contributor
Contributor

Interesting link.  I don't quite understand how that works though.  Currently, I have an ESXi box and a NAS box.  ESXi is booting off of a USB drive and has no other local storage.  NAS box boots off USB, but has 5 1TB disks.  If the drives were removed from the NAS and moved to the ESXi, I could build an iSCSI target server (ex: openfiler), but you would have to have a datastore located on the ESXi host to boot that VM, else it wouldn't be available until openfiler boots, correct?

Does this have any performance gains over a dedicated NAS?  I can see that it could eliminate a physical machine, but given the performance issues I'm having now (that don't seem to be network related), I'm not sure that would show any improvement.  What are your thoughts?

As for my current problem, did the memory screen I posted give any indication of problems?

0 Kudos
rickardnobel
Champion
Champion

TravisT81 wrote:

As for my current problem, did the memory screen I posted give any indication of problems?

Just a quick reply before work, the memory screen shows nothing unusal and looks fine.

My VMware blog: www.rickardnobel.se
0 Kudos
rickardnobel
Champion
Champion

TravisT81 wrote:

NAS box boots off USB, but has 5 1TB disks.  If the drives were removed from the NAS and moved to the ESXi, I could build an iSCSI target server (ex: openfiler), but you would have to have a datastore located on the ESXi host to boot that VM, else it wouldn't be available until openfiler boots, correct?

It is kind of hard to give any general advice about your situation, but in theory: if you would have one local disk drive in your single ESXi host you could use a small partion of the disk for the ESXi installation (or keep it on USB) and make a VMFS datastore of the rest, around 1 TB. Then you could try to place one or more virtual machine directly on this VMFS store and check if there is any performance difference. Since you have only one ESXi host you will not actually need and virtual iSCSI/NAS.

Does this have any performance gains over a dedicated NAS?  I can see that it could eliminate a physical machine, but given the performance issues I'm having now (that don't seem to be network related), I'm not sure that would show any improvement.  What are your thoughts?

Your performance problems could be related to an ineffective NAS device, but also to slow disks and if so it will not be any real difference if the storage is local or across a network.

Could you post a screenshot from RESXTOP with the "d", "u" and "v" screens?

My VMware blog: www.rickardnobel.se
0 Kudos
TravisT81
Contributor
Contributor

I appreciate your help with this.  Attached are the requested screenshots.  This is without me remoted into any of the machines (which is when they seem slow).  I can post up screenshots while I'm accessing the machines also if you are looking for a "under load" result.

Right now, my file server is utilizing a single disk for my vmstore share, so I'm not gaining anything by using the NAS as my VM Storage.  By combining all of the disks into the ESXi host and making all disks available to ESXi, one could be dedicated for VMFS files and the other(s) could be dedicated only for the VM NAS appliance.  If I am understanding this correctly, I'm considering moving towards this, although I'm not sure if my hardware will support it well.

Going this route will likely give me more flexability in tuning each server, as I can allocate resources on the fly.  It would also eliminate a machine that runs 24/7 from my electric bill Smiley Wink, which is a plus.

Anyway, in the mean time, let me know what you think on the screenshots.

Travis

Message was edited by: TravisT81 Added attachments

0 Kudos
rickardnobel
Champion
Champion

TravisT81 wrote:

This is without me remoted into any of the machines (which is when they seem slow).  I can post up screenshots while I'm accessing the machines also if you are looking for a "under load" result.

The screen looks normal, except for a high GAVG/cmd on the "u" screen, but it would be very useful to see the same screens when the VMs are under load. Be sure to expand the putty (or other client) windows as far to the right as possible to get most counters visible.

Right now, my file server is utilizing a single disk for my vmstore share, so I'm not gaining anything by using the NAS as my VM Storage.  By combining all of the disks into the ESXi host and making all disks available to ESXi, one could be dedicated for VMFS files and the other(s) could be dedicated only for the VM NAS appliance.  If I am understanding this correctly, I'm considering moving towards this, although I'm not sure if my hardware will support it well.

VMFS is the filesystem that you will use to store any virtual machines if using directly attached storage (or iSCSI / Fibre channel) so you would have VMFS on all local hard drives that you put into the host. ESXi will create a few very small partitions to store its own operating system files, but it is just a few gigabyte.

If doing so, you could just place the virtual machines directly on the local datastore and you do not have to have any NAS, either physical or virtual. A NAS is mostly useful if having multiple ESXi hosts that should connect to the same datastores.

My VMware blog: www.rickardnobel.se
0 Kudos
TravisT81
Contributor
Contributor

Sorry for the delay.  I've been out of town a bunch in the last few weeks.  Here's the screenshot of the esxtop "u" screen loaded.

To load these machines, I just remote desktop'ed to them and opened a MMC with many snap-ins loaded.  While that's not a real load on any of the servers, it's typically pretty slow to open.  They were all done pretty much simultaneously.  That is on top of the normal (minimal) processing they are doing all the time.

I'm strongly considering moving a 1TB hard drive to the ESX box to see how things play in that configuration.  I may think about migrating this over to a single box, but I'm curious how performance will be with these machines running along with a virtual file server on the box.  I'd like to see network throughput much higher than it is right now, and not sure what to expect when running via a VM.

0 Kudos