Solved: Re: Uneven placement of images with local SSD stor...

ctfoster · ‎03-06-2013

All,

I'm running through a PoC project with a customer using View 5.1 following the guidelines in "VMware Reference Architecture for Stateless Virtual Desktops on Local Solid-State Storage with VMware View 5" as close as we can.

http://www.vmware.com/resources/techresources/10278

I have eight nodes all identically configured with local SSD storage - again all identical model drives, all empty.

Creating a pool of linked clones across this group creates some very uneven placement of the images.

As an example if I create a pool of 80 VM's on the 8 local discs the distribution will be something like...

Disc 1 - 30

Disc 2 - 5

Disc 3 - 0

Disc 4 - 25

Disc 6 - 10

Disc 7 - 10

Disc 8 - 0

If I then remove 30 V's from the pool it might delete all 30 from disk 1. At the moment every action on the pool has to be manually controlled.

Has anybody else come across this behavour.

marcdrinkwater · ‎03-07-2013

Someone can correct me if I'm wrong but I'm told that this is how it's designed.

When you compose a number of desktops the load balancing decision is made there and then, so all the desktops will will be placed on the datastore/host(s) that has the least perceived load.

Because you are using local storage there is no oppertunity to rebalance the desktops.

View solution in original post

Linjo · ‎03-06-2013

Could you describe how you have designed the pools and storage?

Where are the parent and linked clones placed?

// Linjo

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".

ctfoster · ‎03-06-2013

Hi Linjo

The pool is set up for non-persistent linked-clones from a Windows7 gold image.

The pool uses eight local ssd drives, one in each node. We are running the test with 80 machines at the moment although we originally sized to 300, but the result was chaotic. Each local SSD drive has a copy of the replica with the gold image provided from an iSCSI source visible from all the nodes.

No DRS of course.

The only thing that might be off centre is the fact that each node is configured to store the link clone SWP file on a second local SATA drive to save space on the SSD. This seems to work fine.

Thanks

marcdrinkwater · ‎03-07-2013

Someone can correct me if I'm wrong but I'm told that this is how it's designed.

When you compose a number of desktops the load balancing decision is made there and then, so all the desktops will will be placed on the datastore/host(s) that has the least perceived load.

Because you are using local storage there is no oppertunity to rebalance the desktops.

ctfoster · ‎03-07-2013

I understand why it's not going to rebalance - I just dont understand why two of the datastores (all identical) hosted by identical servers have such a high perceived load that nothing is placed on them?

What's the algorithm thats being used to work this out?

Even if this is the case, how can the Reference Architecture use such random placement to support a design that supposed to scale to thousands of desktops ??

marcdrinkwater · ‎03-07-2013

Again, I could be wrong here. But I think it just chooses the one with the least load and sticks them all on there, it does the load query just the once.

ctfoster · ‎03-07-2013

I cant really disagree with that looking that the placement of the images - the only problem is if you buy into this, you also have to accept VMWares own Technical Reference Document describes a configuration that cannot be operated in reality.

Which is going to give me an embarassing conversion with the customer.

marcdrinkwater · ‎03-07-2013

I know all this because I've had the same embarrasing conversation

ctfoster · ‎03-07-2013

Ahhhhhhh.... great. You've made my day. Anybody from VMware like to comment on this thread?

marcdrinkwater · ‎03-07-2013

I was usng Atlantis ILIO so each host sees the ILIO appliance as local storage. It was the guys from Atlantis that told me about the load balancing decisions, hence why I was almost prepared to be proven wrong.

ctfoster · ‎03-07-2013

In that case the Technical Reference which was only updated in September last year and makes some claims with regards scaling up that I believe cannot be supported. So it's at best wrong - at worse willfully misleading.

Does anybody know whether the situation has changed with 5.2?

.

ericblc2 · ‎03-10-2013

Maybe we could publish a template for that embarrassing conversation :smileyblush: ... we are going down the same road ... 30 to 50 hosts... all SSD.

For the swap files we noticed two of them. One for memory swap (VM we eliminated it by make a 100% memory reservation in the master setting... our ram was sized one to one) and there is a second one (the new wmx-*.swp) we eliminated it with a Vm setting. this keeps SSD space to a minimum.

ctfoster · ‎03-10-2013

Eric,

Sorry to hear about your 50 nodes... at least I only have eight! Have you thought of asking VMWare for your money back

How are you rebuilding the pool? We do it manually by 'masking' out each drive in turn but it takes all weekend. Have you come up with anything better?

Again I would like to send out a open invitation for any VM forum expert, moderator or VMWare employee to defend the 'Reference Architecture' as a solution that can be practically implemented.

If not can we remove it from the site please.

ericblc2 · ‎03-11-2013

The pilot (with two hosts) has been accepted by the client. The bigger infrastructure does not exist yet. We will sure look into it.

Eric

dparashar · ‎03-12-2013

The uneven placement of vms while provisioning is an issue that has existed and becomes more pronounced for smaller sized pools. There are two ways to get around this

1) 5.2 has made improvements in this piece and upgrading to it should help you.

2) If an upgrade is not possible, I would suggest running the provisioning at a lower concurrency level. Edit VC from View Admin UI, and change "Max concurrent View Composer provisioning operations:" to a lower value (2 or maybe even 1 as this is a small pool). The default is 8. This would mean that the overall provisioning time will suffer, but that is most likely a one time operation. Doing this should produce more balanced results.

ctfoster · ‎03-12-2013

Thank you for the reply.

Can you define "smaller sized pools"? How big would the pool have to be to be balanced.

We have seen it fail with a Pool size of 300 ?

However I can see how this might work so its worth a try.

What will happen if I ask the pool to reduce in size ? Will this setting help?

dparashar · ‎03-12-2013

Hadn't seen your other comment about 300 sized pool. I agree that I wouldn't call 300 vm pool small, but would like to know what you meant by "chaotic". Was it as unbalanced as a 80 vm pool? Also do all the datastores have the same size/freespace to begin with?

I believe reducing the pool size starts by removing the lower numbered vms. Now if all the initial vms get placed on the same datastore then you wouldn't like the results of reducing the pool size. I would recommend trying out with the lowest concurrency in 5.1.

ctfoster · ‎03-12-2013

We started off with a 300 pool - not expecting any issues. We are now running with a reduced number until we can get a solution sorted as it's more manageable.

When I say chaotic - I mean I could see no pattern to the placement . Some drives were heavily loaded some very light. I suspect it was more balanced than the 80 image pool, but statistically you would expect that anyway. All drives were clean to start.

I will try the low concurrency setting on the pool when I can get access to the test rig.

I would recommend trying out with the lowest concurrency in 5.1

Does that follow in 5.2 as well. ?

dparashar · ‎03-12-2013

I would recommend trying out with the lowest concurrency in 5.1

>Does that follow in 5.2 as well. ?

This is specifically for 5.1 and previous releases. 5.2 should have improved correctness even at higher concurrency levels.

marcdrinkwater · ‎04-11-2013

i'm going through the same issue with a customer today and I was reminded of this thread.

What are the improvements in 5.2?

How is the initial calculation made? is it free space on the Datastore or is it available resources on the host?

All

Uneven placement of images with local SSD storage