VMware Cloud Community
meistermn
Expert
Expert

VMmark results : Scaling of six core cpu and 8 socket plattform inefficient?

If you look at the vmmark results, then i come to the conclusion, that more cores from intel with six cores (Dell R900)

and more socket in the AMD (HP DL785) plattform are not efficient.

As you can sse from 16 to 24 cores, that are 50 more cores, the vmmark result only increases for 20 %

Allthough for 16 to 32 cores, that are 100 % more cores, the vmmark result only incrrase about 50 %.

HP Proliant DL 785 G5 8 sockets 32 total cores 21,88@16 tiles

Dell PowerEdge R900 4 sockets 24 total cores 18.49@14 tiles

HP Proliant DL 585 G5 4 sockets 16 total cores 14.74@10 tiles

IBM System x3650 2 sockets 8 total cores 8.63@6 tiles

Reply
0 Kudos
13 Replies
mreferre
Champion
Champion

Interesting.

I know this might sound an IBM marketing statement but for the sake of the discussion I need to bring in more datapoints. We have commisioned a study using an alternative (yet similar) benchmark to VMMark that showed this:

The report doesn't extrapolate this explicity but if you do the math you will notice that going from a 4-socket 3850M2 to an 8-socket 3950M2 the scalability is (almost) linear. I.e. it doubles.

I think it has been very clear to all that, using the current hypertransport implementation, the AMD scalability beyond 4-socket is pretty poor (remote memory access can require 2 hopes Vs 1 hope for 4-socket configs). I am actually surprised HP decided to publish that result which kind of proves the theory.

More interesting is the 4-cores Vs 6-cores result.... however consider that from your table moving from 8-cores to 16-cores ..... buys you 70%... with a 100% more cores. Moving from 16 to 24 buys you about 25% with 50% more cores....... not the end of the world in my opinion ...... Also consider that the 18.49 result can only be improved.... Smiley Wink

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
meistermn
Expert
Expert

Hello Massimo,

why is IBM not showing more vmmark result. I want to see X3755 M2 and the 8 socket x3950 M2.

Your are right with the HP 785. I would post the result, when shanghai cpu are (with hypertransport 3 version) available. I think then allthough SUN is showing there results for X4600 with 8 cores.

I don't like the vconsolidate benchmark because it is made by intel. Yes it is based on 4 different benchmarks . But the vmmark allows to compare amd cpu and intel cpu .

I know the vmmark only benches vmwares hypervisior and the vconsoldate benchmark benches several hypervisors.

What I really what to know is, is X4 chipset dying with nehalem cpu, or is X5 chipset coming with nehalem.

Reply
0 Kudos
mreferre
Champion
Champion

You raise some good points.

With the arrival of the Dunnigton CPU's the 8-socket 3950M2 would be "challenging" since ESX only supports 32 pCPU and that config would stack up 48 cores. I am assuming here that most of the end-users would want the 6-cores Dunnington SKUs (or for some reasons will you go with the 4-cores Dunnington SKUs? I am interested in your opinion).

I understand the vConsolidate / VMmark issues.... unfortunately there is no standard virtualization benchmark at this point so whichever you pick up... there will always be someone asking why didn't you pick up the other? Smiley Happy

This will change as there WILL be an industry standard benchmark but for the time being I have to use the motto of one of my colleagues: "the beauty of standards is that there are many of those".

I can't answer the last question for obvious reasons but if you want to share your opinion (perhaps privately: ) I would appreciate.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
Reply
0 Kudos
alan_vt
Contributor
Contributor

Adding cores does not equal adding processors in regards to performance. A dual core processor is not going to perform as well as two single core processors.

v/r,

Alan

Reply
0 Kudos
mreferre
Champion
Champion

For the sake of this thread (and its high-level tone) it is my opinion that it is (going to perform as well as two single core processors).

There might be situations where cores on different cpu physical packages (i.e. sockets) might help the performance and there might be situations where cores on the same cpu physical package might help performance.

In order to get into the details of a complex matter like this it would require a very specific analysis of the workload patterns etc etc .... as I said, for the tone of this thread, it is my opinion that not taking into account the ratio cores/sockets is not a big deal.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
Reply
0 Kudos
meistermn
Expert
Expert

Hi Massimo,

vmmark results are out for the X 3850 M2 8 socket system and the HP 785 8 Socket system

X3850 M2 tiles 8 sockets 32 total cores

*Best 4 Socket System is Dell R905 with *20.35@14 tiles

Where are the results for the IBM X3755 with Shanghai 2.7 GHZ and esx i on usb or sdd flash ?

X3755 is the best 4 socket opteron from memory (no downgrade to 533 MHZ , as the other vendor do.

But where is the esxi option for the X3755. Is new coming in juni 2009 with Istanbul opteron ?

Reply
0 Kudos
mreferre
Champion
Champion

Mesitermn,

you know I can't comment on future products.

We do provide ESXi embedded as part of specific server models rather than an option for all server models. Right now we are not offering an ESXi embedded 3755 model.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
Reply
0 Kudos
meistermn
Expert
Expert

Massimo,

look at the Nehalem X5570 Benchmark and with nehalem are coming much virtualization features vt-x,vt-d, vt-c, pci-sig.

4759_4759.png

Reply
0 Kudos
mreferre
Champion
Champion

Yes Nehalem is a good piece of technology.

It will be interesting though to asses (when it's out) whether it would be better to go with a quad-socket/6cores (24 cores) server Vs 2 x dual-socket/4cores (16 cores) given the VI3 licensing per socket. Plus consider the usual story of HE boxes having more memory slots and I/O slots.

Impressive.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
Reply
0 Kudos
meistermn
Expert
Expert

What is really is impressiv is, that the nehalem 2 Socket X5570 2.93 GHZ in the sap sd benchmark is as fast as the 4 socket X7460 2.66 GHZ.

This means that Nehalem 2 Socket x Quad Core x 2 Hyperthread = 8 Cores (16 Threads) is as fast 4 socket x 6 Cores = 24 Cores Dunnington (24 Threads).

Looking at the 24 cores vmmark results, the nehalem can have a vmmark result between 16 -19 with 12 to 14 tiles.

But want is the different for scheduling for a 24 Core (Dunnington) , 16 Core (Shanghai) or 8 Core Nehalem, when they have all have near the same vmware results.

24 Cores means to me 23 vm's wihout having contention, whenecery vm will use 100 % cpu resources.

16 Cores means to me 15 vm's without having contention, when every vm will use 100 % cpu resources.

8 cores means to me 7 vm's without having contention, when every vm will use 100 % cpu resources.

Allthough a GHZ in Dunnting, Shanghai und Nehalm is not the same.

24 x 2,66 GHZ = 63,84 GHZ (Dunnington)

16 x 2,7 GHZ = 43,2 GHZ (Shanghai)

8 x 2,93 GHZ = 23,44 GHZ (Nehalem)

I know the GHZ comparision is worthless, but is funny. I wish we had samething like this: a pound is 500 gramm

Reply
0 Kudos
meistermn
Expert
Expert

Hi Massimo,

nice allthough the benchmark for the IBM X3650 M2 is out.

The Intel Nehalem CPU used in IBM X3650 M2 , FSC TX300 S5 , HP DL 380 G6 are 70-80 percent faster than the AMD Shanghai 2.7 GHZ in the HP DL 385 G5P. Marked then red.

The X3650 M2 is near as fast as the 4 Socket Intel X3850 M2 !!! Marked green.

12/19/2008

IBM

5100

1.98

1532000

25530

510670

Windows Server 2003 Enterprise Edition

DB2 9.5

6.0 (2005)

IBM System x3650 M2, 2 Processors / 8 Cores / 16 Threads, Intel Xeon Processor X5570, 2.93 Ghz, 64 KB L1 cache and 256 KB L2 cache per core, 8 MB L3 cache per processor

49152

2008079\

12/17/2008

IBM

4400

1.99

1322000

22030

440670

Red Hat Enterprise Linux Server 5.2 on XEN 3.1.0 (using 24 virtual CPUs)

DB2 9.5

6.0 (2005)

IBM System x3850 M2, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor

65536

2008077\

12/17/2008

IBM

4386

1.96

1320000

22000

440000

Windows Server 2003 Enterprise Edition

DB2 9.5

6.0 (2005)

IBM BladeCenter LS42, 4 Processors / 16 Cores / 16 Threads, Quad-Core AMD Opteron Processor 8384, 2.7 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 6 MB L3 cache per processor

65536

2008076\

12/16/2008

Dell

4010

1.23

1286000

21430

428670

Windows Server 2003 Enterprise Edition

SQL Server 2005

6.0 (2005)

Dell PowerEdge Model R900, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor

122880

2008074\

12/15/2008

Fujitsu Siemens Computers

4715

1.96

1419000

23650

473000

Windows Server 2003 Enterprise Edition

SQL Server 2005

6.0 (2005)

Fujitsu Siemens Computers PRIMERGY Model TX300 S5 / RX300 S5, 2 Processors / 8 Cores / 16 Threads, Intel Xeon Processor X5570, 2.93 Ghz, 64 KB L1 cache and 256 KB L2 cache per core, 8 MB L3 cache per processor

49152

2008072\

12/15/2008

HP

4995

1.99

1500000

25000

500000

Windows Server 2003 Enterprise Edition

SQL Server 2005

6.0 (2005)

HP ProLiant DL380 G6, 2 Processors / 8 Cores / 16 Threads, Intel Xeon Processor X5570, 2.93 Ghz, 64 KB L1 cache and 256 KB L2 cache per core, 8 MB L3 cache per processor

49125

2008071\

12/9/2008

Sun Microsystems

7825

1.96

2356000

39270

785330

Solaris 10

MaxDB 7.6

6.0 (2005) (Unicode)

Sun Fire X4600M2, 8 Processors / 32 Cores / 32 Threads, Quad-Core AMD Opteron Processor 8384, 2.7 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 6 MB L3 cache per processor

131072

2008070\

12/2/2008

IBM

5300

1.98

1593000

26550

531000

Windows Server 2003 Datacenter Edition

DB2 9.5

6.0 (2005)

IBM System x3850 M2, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor

65536

2008067\

11/18/2008

IBM

5156

1.97

1551000

25850

517000

Red Hat Enterprise Linux 5.2

DB2 9.5

6.0 (2005)

IBM System x3850 M2, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor

65536

2008066\

11/14/2008

HP

2752

1.98

827000

13780

275670

Windows Server 2003 Enterprise Edition

SQL Server 2005

6.0 (2005)

HP ProLiant DL385 G5p, 2 Processors / 8 Cores / 8 Threads, Quad-Core AMD Opteron Processor 2384, 2.7 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 6 MB L3 cache per processor

32768

2008065\

11/12/2008

HP

7010

1.88

2124000

35400

708000

SuSE Linux Enterprise Server 10

Oracle 10g

6.0 (2005)

HP ProLiant DL785 G5, 8 Processors / 32 Cores / 32 Threads, Quad-Core AMD Opteron Processor 8384, 2.7 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 6 MB L3 cache per processor

131072

2008064\

10/22/2008

Sun Microsystems

5800

1.73

1780000

29670

593330

Solaris 10

MaxDB 7.6

6.0 (Unicode)

Sun Fire X4600M2, 8 Processors / 32 Cores / 32 Threads, Quad-Core AMD Opteron Processor 8360 SE, 2.5 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 2 MB L3 cache per processor

131072

2008061\

10/17/2008

Fujitsu Siemens Computers

5135

1.98

1543000

25720

514330

Windows Server 2003 Enterprise Edition

SQL Server 2005

6.0 (2005)

Fujitsu Siemens Computers PRIMERGY Model RX600 S4, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor

65536

2008060\

9/30/2008

Dell

501

1.72

154000

2570

51330

Windows Server 2003 Enterprise Edition on Windows 2008 Hyper-V (using 2 virtual CPUs)

SQL Server 2005

6.0 (2005)

Dell PowerEdge Model R900, 4 Processors / 16 Cores / 16 Threads, Quad-Core Intel Xeon Processor X7350, 2.93 Ghz, 64 KB L1 cache per core and 4 MB L2 cache per 2 cores

90112

2008055\

9/12/2008

HP

5155

1.97

1550000

25830

516670

Windows Server 2003 Enterprise Edition

SQL Server 2005

6.0 (2005)

HP ProLiant DL580 G5, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor

65536

2008050\

9/12/2008

Sun Microsystems

4600

1.94

1387000

23120

462330

Solaris 10

MaxDB 7.6

6.0 (2005) (Unicode)

Sun Fire X4450, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor

81920

2008051\

9/12/2008

HP

4432

1.99

1331000

22180

443670

Windows Server 2003 Enterprise Edition

SQL Server 2005

6.0 (2005)

HP ProLiant BL680c G5, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP E7450, 2.4 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per two cores, 12 MB L3 cache per processor

65536

2008049\

9/12/2008

HP

2518

1.99

756000

12600

252000

Windows Server 2003 Enterprise Edition

SQL Server 2005

6.0 (2005)

HP ProLiant DL380 G5, 2 Processors / 8 Cores / 8 Threads, Quad-Core Intel Xeon Processor X5470, 3.33 Ghz, 64 KB L1 cache per core and 6 MB L2 cache per 2 cores

32768

2008047\

9/12/2008

HP

2518

1.99

756000

12600

252000

Windows Server 2003 Enterprise Edition

SQL Server 2005

6.0 (2005)

HP ProLiant BL460c, 2 Processors / 8 Cores / 8 Threads, Quad-Core Intel Xeon Processor X5470, 3.33 Ghz, 64 KB L1 cache per core and 6 MB L2 cache per 2 cores

32768

2008048\

9/8/2008

IBM

9200

1.95

2770000

46170

923330

Windows Server 2003 Datacenter Edition

DB2 9.5

6.0 (2005)

IBM System x3950 M2, 8 Processors / 48 Cores / 48 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor

131072

2008046\

8/20/2008

Sun Microsystems

2100

1.98

631000

10520

210330

Windows Server 2008 Enterprise Edition

SQL Server 2008

6.0 (2005)

Sun Blade X8450, 4 Processors / 16 Cores / 16 Threads, Quad-Core Intel Xeon Processor E7340, 2.4 Ghz, 64 KB L1 cache per core and 4 MB L2 cache per 2 cores

65536

2008045\

7/30/2008

IBM

545

1.98

164000

2730

54670

Windows Server 2003 Enterprise Edition on VMware ESX Server 3.5 (using 2 virtual CPUs)

DB2 9.5

6.0 (2005)

IBM System x3850 M2, 4 Processors / 16 Cores / 16 Threads, Quad-Core Intel Xeon Processor X7350, 2.93 Ghz, 64 KB L1 cache per core and 4 MB L2 cache per 2 cores

65536

2008044\

7/15/2008

Dell

2121

1.73

651000

10850

217000

Windows Server 2003 Enterprise Edition

SQL Server 2005

6.0 (2005)

Dell PowerEdge Model M600, 2 Processors / 8 Cores / 8 Threads, Quad-Core Intel Xeon Processor X5460, 3.16 Ghz, 64 KB L1 cache per core and 6 MB L2 cache per 2 cores

32768

2008043\

7/11/2008

HP

3801

1.99

1141000

19020

380330

Windows Server 2003 Enterprise Edition

SQL Server 2005

6.0 (2005)

HP ProLiant DL585 G5, 4 Processors / 16 Cores / 16 Threads, Quad-Core AMD Opteron Processor 8360 SE, 2.5 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 2 MB L3 cache per processor

65536

2008041\

6/25/2008

IBM

6615

1.99

1986000

33100

662000

Windows Server 2003 Datacenter Edition

DB2 9.5

6.0 (2005)

IBM System x3950 M2, 8 Processors / 32 Cores / 32 Threads, Quad-Core Intel Xeon Processor X7350, 2.93 Ghz, 64 KB L1 cache per core and 4 MB L2 cache per 2 cores

131072

2008035\

6/19/2008

Sun Microsystems

3550

1.94

1071000

17850

357000

Solaris 10

MaxDB 7.6

6.0 (2005) (Unicode)

Sun Blade Model X8440, 4 Processors / 16 Cores / 16 Threads, Quad-Core AMD Opteron processor Model 8356, 2.3 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 2 MB L3 cache per processor

65536

2008033\

6/19/2008

IBM

3540

1.99

1063000

17720

354330

Windows Server 2003 Enterprise Edition

DB2 9.5

6.0 (2005)

IBM System x3755, 4 Processors / 16 Cores / 16 Threads, Quad-Core AMD Opteron processor Model 8356, 2.3 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 2 MB L3 cache per processor

65536

2008032\

Reply
0 Kudos
meistermn
Expert
Expert

UPS, there is although a comparision between XEN , Hyper-V and VMware ESX SAPS Benchmark!

I read this in the following way:

1.) With Xen were used 24 x 1 VM with 1 VCPU. (I do not believe that Xen can have 1 VM with 24 VCPU at the moment)

2.) With Hyper-V were used 1 x 1 VM with 2 VCPU

3.) With ESX 3.5 were used 1 x 1 VM with 2 VCPU.

If we build 15 VM's with 1 VCPU what could the SAPS be? Maybe 15 x 1285 = 19275 SAPS ?

12/17/2008

IBM

4400

1.99

1322000

22030

440670

Red Hat Enterprise Linux Server 5.2 on XEN 3.1.0 (using 24 virtual CPUs)

DB2 9.5

6.0 (2005)

IBM System x3850 M2, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor

65536

2008077\

9/30/2008

Dell

501

1.72

154000

2570

51330

Windows Server 2003 Enterprise Edition on Windows 2008 Hyper-V (using 2 virtual CPUs)

SQL Server 2005

6.0 (2005)

Dell PowerEdge Model R900, 4 Processors / 16 Cores / 16 Threads, Quad-Core Intel Xeon Processor X7350, 2.93 Ghz, 64 KB L1 cache per core and 4 MB L2 cache per 2 cores

90112

2008055\

7/30/2008

IBM

545

1.98

164000

2730

54670

Windows Server 2003 Enterprise Edition on VMware ESX Server 3.5 (using 2 virtual CPUs)

DB2 9.5

6.0 (2005)

IBM System x3850 M2, 4 Processors / 16 Cores / 16 Threads, Quad-Core Intel Xeon Processor X7350, 2.93 Ghz, 64 KB L1 cache per core and 4 MB L2 cache per 2 cores

65536

2008044\

Reply
0 Kudos
mreferre
Champion
Champion

>But want is the different for scheduling for a 24 Core (Dunnington) , 16 Core (Shanghai) or 8 Core Nehalem, when they have all have near the same vmware results.

>24 Cores means to me 23 vm's wihout having contention, whenecery vm will use 100 % cpu resources.

>16 Cores means to me 15 vm's without having contention, when every vm will use 100 % cpu resources.

>8 cores means to me 7 vm's without having contention, when every vm will use 100 % cpu resources.

That's the theory behind scale up Vs scale out. The more engines you have the more likely you will have free processors to handle your workloads. This is the basis of the mainframes... but admittedly ESX is not even close to the optimization that has been achieved on those boxes. As per your comment ... it would be interesting to study the difference (in scheduling efficiency) between 8 very fast cores and 24 average cores. What's better? One could speculate .....

The X3650 M2 is near as fast as the 4 Socket Intel X3850 M2

Well this doesn't surprise me. We have seen this before......... an M3 will be twice as fast as the M2 (so to speak) as usual ..... This is just the Moore's law. I am sure in the long run this will have effects and consequences re how hw vendors make business but that's another story ....

If we build 15 VM's with 1 VCPU what could the SAPS be? Maybe 15 x 1285 = 19275 SAPS ?

Where does the 1285 comes from? Did you mean 2730/2=1365 SAPS?

I would say that, given the fact that usually a 2 x 1vCPU VMs deliver more throughput than 1 x 2vCPU VMs a 1 x 1vCPU VM would have had a throughput of 2730/2+something. Something would be difficult to quantify.

Also VMware has demonstrated many times nearly linear scalability of VMs until you start overcommitting resources (CPU/MEMORY) so I would say that a 16core system would have been able to deliver a good (2730/2+something)*16..... I speculate they would have been able to touch the 20000 SAPS quite easily.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info