If you look at the vmmark results, then i come to the conclusion, that more cores from intel with six cores (Dell R900)
and more socket in the AMD (HP DL785) plattform are not efficient.
As you can sse from 16 to 24 cores, that are 50 more cores, the vmmark result only increases for 20 %
Allthough for 16 to 32 cores, that are 100 % more cores, the vmmark result only incrrase about 50 %.
HP Proliant DL 785 G5 8 sockets 32 total cores 21,88@16 tiles
Dell PowerEdge R900 4 sockets 24 total cores 18.49@14 tiles
HP Proliant DL 585 G5 4 sockets 16 total cores 14.74@10 tiles
IBM System x3650 2 sockets 8 total cores 8.63@6 tiles
Interesting.
I know this might sound an IBM marketing statement but for the sake of the discussion I need to bring in more datapoints. We have commisioned a study using an alternative (yet similar) benchmark to VMMark that showed this:
The report doesn't extrapolate this explicity but if you do the math you will notice that going from a 4-socket 3850M2 to an 8-socket 3950M2 the scalability is (almost) linear. I.e. it doubles.
I think it has been very clear to all that, using the current hypertransport implementation, the AMD scalability beyond 4-socket is pretty poor (remote memory access can require 2 hopes Vs 1 hope for 4-socket configs). I am actually surprised HP decided to publish that result which kind of proves the theory.
More interesting is the 4-cores Vs 6-cores result.... however consider that from your table moving from 8-cores to 16-cores ..... buys you 70%... with a 100% more cores. Moving from 16 to 24 buys you about 25% with 50% more cores....... not the end of the world in my opinion ...... Also consider that the 18.49 result can only be improved....
Massimo.
Hello Massimo,
why is IBM not showing more vmmark result. I want to see X3755 M2 and the 8 socket x3950 M2.
Your are right with the HP 785. I would post the result, when shanghai cpu are (with hypertransport 3 version) available. I think then allthough SUN is showing there results for X4600 with 8 cores.
I don't like the vconsolidate benchmark because it is made by intel. Yes it is based on 4 different benchmarks . But the vmmark allows to compare amd cpu and intel cpu .
I know the vmmark only benches vmwares hypervisior and the vconsoldate benchmark benches several hypervisors.
What I really what to know is, is X4 chipset dying with nehalem cpu, or is X5 chipset coming with nehalem.
You raise some good points.
With the arrival of the Dunnigton CPU's the 8-socket 3950M2 would be "challenging" since ESX only supports 32 pCPU and that config would stack up 48 cores. I am assuming here that most of the end-users would want the 6-cores Dunnington SKUs (or for some reasons will you go with the 4-cores Dunnington SKUs? I am interested in your opinion).
I understand the vConsolidate / VMmark issues.... unfortunately there is no standard virtualization benchmark at this point so whichever you pick up... there will always be someone asking why didn't you pick up the other?
This will change as there WILL be an industry standard benchmark but for the time being I have to use the motto of one of my colleagues: "the beauty of standards is that there are many of those".
I can't answer the last question for obvious reasons but if you want to share your opinion (perhaps privately: ) I would appreciate.
Massimo.
Adding cores does not equal adding processors in regards to performance. A dual core processor is not going to perform as well as two single core processors.
v/r,
Alan
For the sake of this thread (and its high-level tone) it is my opinion that it is (going to perform as well as two single core processors).
There might be situations where cores on different cpu physical packages (i.e. sockets) might help the performance and there might be situations where cores on the same cpu physical package might help performance.
In order to get into the details of a complex matter like this it would require a very specific analysis of the workload patterns etc etc .... as I said, for the tone of this thread, it is my opinion that not taking into account the ratio cores/sockets is not a big deal.
Massimo.
Hi Massimo,
vmmark results are out for the X 3850 M2 8 socket system and the HP 785 8 Socket system
X3850 M2 tiles 8 sockets 32 total cores
*Best 4 Socket System is Dell R905 with *20.35@14 tiles
Where are the results for the IBM X3755 with Shanghai 2.7 GHZ and esx i on usb or sdd flash ?
X3755 is the best 4 socket opteron from memory (no downgrade to 533 MHZ , as the other vendor do.
But where is the esxi option for the X3755. Is new coming in juni 2009 with Istanbul opteron ?
Mesitermn,
you know I can't comment on future products.
We do provide ESXi embedded as part of specific server models rather than an option for all server models. Right now we are not offering an ESXi embedded 3755 model.
Massimo.
Yes Nehalem is a good piece of technology.
It will be interesting though to asses (when it's out) whether it would be better to go with a quad-socket/6cores (24 cores) server Vs 2 x dual-socket/4cores (16 cores) given the VI3 licensing per socket. Plus consider the usual story of HE boxes having more memory slots and I/O slots.
Impressive.
Massimo.
What is really is impressiv is, that the nehalem 2 Socket X5570 2.93 GHZ in the sap sd benchmark is as fast as the 4 socket X7460 2.66 GHZ.
This means that Nehalem 2 Socket x Quad Core x 2 Hyperthread = 8 Cores (16 Threads) is as fast 4 socket x 6 Cores = 24 Cores Dunnington (24 Threads).
Looking at the 24 cores vmmark results, the nehalem can have a vmmark result between 16 -19 with 12 to 14 tiles.
But want is the different for scheduling for a 24 Core (Dunnington) , 16 Core (Shanghai) or 8 Core Nehalem, when they have all have near the same vmware results.
24 Cores means to me 23 vm's wihout having contention, whenecery vm will use 100 % cpu resources.
16 Cores means to me 15 vm's without having contention, when every vm will use 100 % cpu resources.
8 cores means to me 7 vm's without having contention, when every vm will use 100 % cpu resources.
Allthough a GHZ in Dunnting, Shanghai und Nehalm is not the same.
24 x 2,66 GHZ = 63,84 GHZ (Dunnington)
16 x 2,7 GHZ = 43,2 GHZ (Shanghai)
8 x 2,93 GHZ = 23,44 GHZ (Nehalem)
I know the GHZ comparision is worthless, but is funny. I wish we had samething like this: a pound is 500 gramm
Hi Massimo,
nice allthough the benchmark for the IBM X3650 M2 is out.
The Intel Nehalem CPU used in IBM X3650 M2 , FSC TX300 S5 , HP DL 380 G6 are 70-80 percent faster than the AMD Shanghai 2.7 GHZ in the HP DL 385 G5P. Marked then red.
The X3650 M2 is near as fast as the 4 Socket Intel X3850 M2 !!! Marked green.
12/19/2008 | IBM | 5100 | 1.98 | 1532000 | 25530 | 510670 | Windows Server 2003 Enterprise Edition | DB2 9.5 | 6.0 (2005) | IBM System x3650 M2, 2 Processors / 8 Cores / 16 Threads, Intel Xeon Processor X5570, 2.93 Ghz, 64 KB L1 cache and 256 KB L2 cache per core, 8 MB L3 cache per processor | 49152 | |
12/17/2008 | IBM | 4400 | 1.99 | 1322000 | 22030 | 440670 | Red Hat Enterprise Linux Server 5.2 on XEN 3.1.0 (using 24 virtual CPUs) | DB2 9.5 | 6.0 (2005) | IBM System x3850 M2, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor | 65536 | |
12/17/2008 | IBM | 4386 | 1.96 | 1320000 | 22000 | 440000 | Windows Server 2003 Enterprise Edition | DB2 9.5 | 6.0 (2005) | IBM BladeCenter LS42, 4 Processors / 16 Cores / 16 Threads, Quad-Core AMD Opteron Processor 8384, 2.7 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 6 MB L3 cache per processor | 65536 | |
12/16/2008 | Dell | 4010 | 1.23 | 1286000 | 21430 | 428670 | Windows Server 2003 Enterprise Edition | SQL Server 2005 | 6.0 (2005) | Dell PowerEdge Model R900, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor | 122880 | |
12/15/2008 | Fujitsu Siemens Computers | 4715 | 1.96 | 1419000 | 23650 | 473000 | Windows Server 2003 Enterprise Edition | SQL Server 2005 | 6.0 (2005) | Fujitsu Siemens Computers PRIMERGY Model TX300 S5 / RX300 S5, 2 Processors / 8 Cores / 16 Threads, Intel Xeon Processor X5570, 2.93 Ghz, 64 KB L1 cache and 256 KB L2 cache per core, 8 MB L3 cache per processor | 49152 | |
12/15/2008 | HP | 4995 | 1.99 | 1500000 | 25000 | 500000 | Windows Server 2003 Enterprise Edition | SQL Server 2005 | 6.0 (2005) | HP ProLiant DL380 G6, 2 Processors / 8 Cores / 16 Threads, Intel Xeon Processor X5570, 2.93 Ghz, 64 KB L1 cache and 256 KB L2 cache per core, 8 MB L3 cache per processor | 49125 | |
12/9/2008 | Sun Microsystems | 7825 | 1.96 | 2356000 | 39270 | 785330 | Solaris 10 | MaxDB 7.6 | 6.0 (2005) (Unicode) | Sun Fire X4600M2, 8 Processors / 32 Cores / 32 Threads, Quad-Core AMD Opteron Processor 8384, 2.7 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 6 MB L3 cache per processor | 131072 | |
12/2/2008 | IBM | 5300 | 1.98 | 1593000 | 26550 | 531000 | Windows Server 2003 Datacenter Edition | DB2 9.5 | 6.0 (2005) | IBM System x3850 M2, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor | 65536 | |
11/18/2008 | IBM | 5156 | 1.97 | 1551000 | 25850 | 517000 | Red Hat Enterprise Linux 5.2 | DB2 9.5 | 6.0 (2005) | IBM System x3850 M2, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor | 65536 | |
11/14/2008 | HP | 2752 | 1.98 | 827000 | 13780 | 275670 | Windows Server 2003 Enterprise Edition | SQL Server 2005 | 6.0 (2005) | HP ProLiant DL385 G5p, 2 Processors / 8 Cores / 8 Threads, Quad-Core AMD Opteron Processor 2384, 2.7 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 6 MB L3 cache per processor | 32768 | |
11/12/2008 | HP | 7010 | 1.88 | 2124000 | 35400 | 708000 | SuSE Linux Enterprise Server 10 | Oracle 10g | 6.0 (2005) | HP ProLiant DL785 G5, 8 Processors / 32 Cores / 32 Threads, Quad-Core AMD Opteron Processor 8384, 2.7 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 6 MB L3 cache per processor | 131072 | |
10/22/2008 | Sun Microsystems | 5800 | 1.73 | 1780000 | 29670 | 593330 | Solaris 10 | MaxDB 7.6 | 6.0 (Unicode) | Sun Fire X4600M2, 8 Processors / 32 Cores / 32 Threads, Quad-Core AMD Opteron Processor 8360 SE, 2.5 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 2 MB L3 cache per processor | 131072 | |
10/17/2008 | Fujitsu Siemens Computers | 5135 | 1.98 | 1543000 | 25720 | 514330 | Windows Server 2003 Enterprise Edition | SQL Server 2005 | 6.0 (2005) | Fujitsu Siemens Computers PRIMERGY Model RX600 S4, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor | 65536 | |
9/30/2008 | Dell | 501 | 1.72 | 154000 | 2570 | 51330 | Windows Server 2003 Enterprise Edition on Windows 2008 Hyper-V (using 2 virtual CPUs) | SQL Server 2005 | 6.0 (2005) | Dell PowerEdge Model R900, 4 Processors / 16 Cores / 16 Threads, Quad-Core Intel Xeon Processor X7350, 2.93 Ghz, 64 KB L1 cache per core and 4 MB L2 cache per 2 cores | 90112 | |
9/12/2008 | HP | 5155 | 1.97 | 1550000 | 25830 | 516670 | Windows Server 2003 Enterprise Edition | SQL Server 2005 | 6.0 (2005) | HP ProLiant DL580 G5, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor | 65536 | |
9/12/2008 | Sun Microsystems | 4600 | 1.94 | 1387000 | 23120 | 462330 | Solaris 10 | MaxDB 7.6 | 6.0 (2005) (Unicode) | Sun Fire X4450, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor | 81920 | |
9/12/2008 | HP | 4432 | 1.99 | 1331000 | 22180 | 443670 | Windows Server 2003 Enterprise Edition | SQL Server 2005 | 6.0 (2005) | HP ProLiant BL680c G5, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor MP E7450, 2.4 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per two cores, 12 MB L3 cache per processor | 65536 | |
9/12/2008 | HP | 2518 | 1.99 | 756000 | 12600 | 252000 | Windows Server 2003 Enterprise Edition | SQL Server 2005 | 6.0 (2005) | HP ProLiant DL380 G5, 2 Processors / 8 Cores / 8 Threads, Quad-Core Intel Xeon Processor X5470, 3.33 Ghz, 64 KB L1 cache per core and 6 MB L2 cache per 2 cores | 32768 | |
9/12/2008 | HP | 2518 | 1.99 | 756000 | 12600 | 252000 | Windows Server 2003 Enterprise Edition | SQL Server 2005 | 6.0 (2005) | HP ProLiant BL460c, 2 Processors / 8 Cores / 8 Threads, Quad-Core Intel Xeon Processor X5470, 3.33 Ghz, 64 KB L1 cache per core and 6 MB L2 cache per 2 cores | 32768 | |
9/8/2008 | IBM | 9200 | 1.95 | 2770000 | 46170 | 923330 | Windows Server 2003 Datacenter Edition | DB2 9.5 | 6.0 (2005) | IBM System x3950 M2, 8 Processors / 48 Cores / 48 Threads, Intel Xeon Processor MP X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor | 131072 | |
8/20/2008 | Sun Microsystems | 2100 | 1.98 | 631000 | 10520 | 210330 | Windows Server 2008 Enterprise Edition | SQL Server 2008 | 6.0 (2005) | Sun Blade X8450, 4 Processors / 16 Cores / 16 Threads, Quad-Core Intel Xeon Processor E7340, 2.4 Ghz, 64 KB L1 cache per core and 4 MB L2 cache per 2 cores | 65536 | |
7/30/2008 | IBM | 545 | 1.98 | 164000 | 2730 | 54670 | Windows Server 2003 Enterprise Edition on VMware ESX Server 3.5 (using 2 virtual CPUs) | DB2 9.5 | 6.0 (2005) | IBM System x3850 M2, 4 Processors / 16 Cores / 16 Threads, Quad-Core Intel Xeon Processor X7350, 2.93 Ghz, 64 KB L1 cache per core and 4 MB L2 cache per 2 cores | 65536 | |
7/15/2008 | Dell | 2121 | 1.73 | 651000 | 10850 | 217000 | Windows Server 2003 Enterprise Edition | SQL Server 2005 | 6.0 (2005) | Dell PowerEdge Model M600, 2 Processors / 8 Cores / 8 Threads, Quad-Core Intel Xeon Processor X5460, 3.16 Ghz, 64 KB L1 cache per core and 6 MB L2 cache per 2 cores | 32768 | |
7/11/2008 | HP | 3801 | 1.99 | 1141000 | 19020 | 380330 | Windows Server 2003 Enterprise Edition | SQL Server 2005 | 6.0 (2005) | HP ProLiant DL585 G5, 4 Processors / 16 Cores / 16 Threads, Quad-Core AMD Opteron Processor 8360 SE, 2.5 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 2 MB L3 cache per processor | 65536 | |
6/25/2008 | IBM | 6615 | 1.99 | 1986000 | 33100 | 662000 | Windows Server 2003 Datacenter Edition | DB2 9.5 | 6.0 (2005) | IBM System x3950 M2, 8 Processors / 32 Cores / 32 Threads, Quad-Core Intel Xeon Processor X7350, 2.93 Ghz, 64 KB L1 cache per core and 4 MB L2 cache per 2 cores | 131072 | |
6/19/2008 | Sun Microsystems | 3550 | 1.94 | 1071000 | 17850 | 357000 | Solaris 10 | MaxDB 7.6 | 6.0 (2005) (Unicode) | Sun Blade Model X8440, 4 Processors / 16 Cores / 16 Threads, Quad-Core AMD Opteron processor Model 8356, 2.3 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 2 MB L3 cache per processor | 65536 | |
6/19/2008 | IBM | 3540 | 1.99 | 1063000 | 17720 | 354330 | Windows Server 2003 Enterprise Edition | DB2 9.5 | 6.0 (2005) | IBM System x3755, 4 Processors / 16 Cores / 16 Threads, Quad-Core AMD Opteron processor Model 8356, 2.3 Ghz, 128 KB L1 cache and 512 KB L2 cache per core, 2 MB L3 cache per processor | 65536 |
UPS, there is although a comparision between XEN , Hyper-V and VMware ESX SAPS Benchmark!
I read this in the following way:
1.) With Xen were used 24 x 1 VM with 1 VCPU. (I do not believe that Xen can have 1 VM with 24 VCPU at the moment)
2.) With Hyper-V were used 1 x 1 VM with 2 VCPU
3.) With ESX 3.5 were used 1 x 1 VM with 2 VCPU.
If we build 15 VM's with 1 VCPU what could the SAPS be? Maybe 15 x 1285 = 19275 SAPS ?
12/17/2008 | IBM | 4400 | 1.99 | 1322000 | 22030 | 440670 | Red Hat Enterprise Linux Server 5.2 on XEN 3.1.0 (using 24 virtual CPUs) | DB2 9.5 | 6.0 (2005) | IBM System x3850 M2, 4 Processors / 24 Cores / 24 Threads, Intel Xeon Processor X7460, 2.66 Ghz, 64 KB L1 cache per core and 3 MB L2 cache per 2 cores, 16 MB L3 cache per processor | 65536 |
9/30/2008 | Dell | 501 | 1.72 | 154000 | 2570 | 51330 | Windows Server 2003 Enterprise Edition on Windows 2008 Hyper-V (using 2 virtual CPUs) | SQL Server 2005 | 6.0 (2005) | Dell PowerEdge Model R900, 4 Processors / 16 Cores / 16 Threads, Quad-Core Intel Xeon Processor X7350, 2.93 Ghz, 64 KB L1 cache per core and 4 MB L2 cache per 2 cores | 90112 |
7/30/2008 | IBM | 545 | 1.98 | 164000 | 2730 | 54670 | Windows Server 2003 Enterprise Edition on VMware ESX Server 3.5 (using 2 virtual CPUs) | DB2 9.5 | 6.0 (2005) | IBM System x3850 M2, 4 Processors / 16 Cores / 16 Threads, Quad-Core Intel Xeon Processor X7350, 2.93 Ghz, 64 KB L1 cache per core and 4 MB L2 cache per 2 cores | 65536 |
>But want is the different for scheduling for a 24 Core (Dunnington) , 16 Core (Shanghai) or 8 Core Nehalem, when they have all have near the same vmware results.
>24 Cores means to me 23 vm's wihout having contention, whenecery vm will use 100 % cpu resources.
>16 Cores means to me 15 vm's without having contention, when every vm will use 100 % cpu resources.
>8 cores means to me 7 vm's without having contention, when every vm will use 100 % cpu resources.
That's the theory behind scale up Vs scale out. The more engines you have the more likely you will have free processors to handle your workloads. This is the basis of the mainframes... but admittedly ESX is not even close to the optimization that has been achieved on those boxes. As per your comment ... it would be interesting to study the difference (in scheduling efficiency) between 8 very fast cores and 24 average cores. What's better? One could speculate .....
The X3650 M2 is near as fast as the 4 Socket Intel X3850 M2
Well this doesn't surprise me. We have seen this before......... an M3 will be twice as fast as the M2 (so to speak) as usual ..... This is just the Moore's law. I am sure in the long run this will have effects and consequences re how hw vendors make business but that's another story ....
If we build 15 VM's with 1 VCPU what could the SAPS be? Maybe 15 x 1285 = 19275 SAPS ?
Where does the 1285 comes from? Did you mean 2730/2=1365 SAPS?
I would say that, given the fact that usually a 2 x 1vCPU VMs deliver more throughput than 1 x 2vCPU VMs a 1 x 1vCPU VM would have had a throughput of 2730/2+something. Something would be difficult to quantify.
Also VMware has demonstrated many times nearly linear scalability of VMs until you start overcommitting resources (CPU/MEMORY) so I would say that a 16core system would have been able to deliver a good (2730/2+something)*16..... I speculate they would have been able to touch the 20000 SAPS quite easily.
Massimo.