Re: Terasort benchmarking with BDE 1.1

cmutchle · ‎02-12-2014

I have moved into the production POC phase of my deployment with BDE and I am looking for guidance on the sort of Terasort benchmarking I should be doing. I am curious what sort of Terasort jobs were run to generate the benchmarking results we see advertised in the documentation.

Do you have any guidance on the sort of parameters I should be passing to generate a dataset and then running Terasort against the dataset. Specifically, I am looking to perform tests against the standard CentOS 5.9 template, a stock CentOS 6.2 template and a customized CentOS 6.2 template using an in-house built kernel. I will also be running the same tests against a traditional Hadoop cluster in BDE vs a distributed data/compute Hadoop cluster using BDE.

Any suggestions for an ops-oriented person will be greatly appreciated.

Thanks.

--

Chris Mutchler

cmutchle@adobe.com

Cindia · ‎02-12-2014

Hi, Chris

Please refer to below examples to your terasort benchmark with default Hadoop package.

hadoop jar hadoop_home_path/hadoop-XXXXX-examples.jar teragen -Dmapred.map.tasks=[slot number] 10000000000 /usr/test/input #1T dataset adopted here but you should tune the number according to your own memory capacity

hadoop jar hadoop_home_path/hadoop-XXXXX-examples.jar terasort -Dmapred.reduce.tasks=[slot number] /usr/root/input /usr/test/output

hadoop jar hadoop_home_path/hadoop-XXXXX-examples.jar teravalidate /usr/test/output /usr/test/final

From performance perspective, we recommend to adopt centos 6.2 template. It would be very nice if you could provide more information on the goal of your POC and primary user/ops scenarios. Then we may provide better support to you.

jessehuvmw · ‎02-14-2014

Hi Chris,

you can also refer to A Benchmarking Case Study of Virtualized Hadoop Performance on VMware vSphere 5. It's based on CDH3 and vSphere, and doesn't use BDE to deploy CDH3 cluster. But the hadoop settings for running terasort benchmarking should be meaningful to the hadoop cluster created by BDE.

Thanks

Jesse

Cheers, Jesse Hu

Jackstone · ‎02-15-2014

It's a little tricky to do performance comparison. You need to make sure the apache configurations (mapred-site.xml and hdfs-site.xml) are same before run the benchmarks (Terasort 1T should be a good candidate).

One thing you may want to know is that all vm disks are provisioned with "thick provision lazy zero". To get the best IO performance, it's suggested to fill up the disks by writing HDFS before the benchmarks. Of course, you can delete the files right after the filling up.

Binbin

tvanpeer · ‎02-17-2014

Chris,

Fwiw, I have done some terasort benchmarks on a BDE 1.0 cluster with Apache, which means Hadoop 1.x. So if you're thinking about benchmarking with a distribution with Hadoop 2.x (e.g. Pivotal 1.1), none of the text below is valid.

Make sure dfs.blocksize and io.file.buffer.size are set to reasonably high values, somewhere between 128MB and 1GB for dfs.blocksize and maybe 64k for io.file.buffer.size. These can be set best cluster wide in the appropriate xml file. You cannot pass a io.file.buffer.size value in a command, it is set at cluster boot.

hadoop jar hadoop***examples***.jar teragen -Dmapred.map.tasks=70 10000000000 /terasort/inputdir-1TB-70

Generates 1TB in 70 chunks in a directory called /terasort/inputdir-1TB-70. Every map task takes a bit of the 1TB. The number of map tasks could/should be equal to the number of map slots in your cluster. Unless you want to benchmark teragen, I wouldn't bother specifying any other parameters.

Now, if you want to terasort this, you will get something like:

hadoop jar hadoop***examples***.jar terasort -Dmapred.reduce.tasks=40 -Dio.sort.factor=40 -Dio.sort.mb=400 -Dmapred.child.java.opts="-Xmx2048m" -Dmapred.compress.map.output=true /terasort/inputdir-1TB-70 /terasort/outputdir

You cannot set the number of map tasks in terasort, that is done for you. I set the number of reduce tasks equal to the number of reduce slots I have. There is some school of thought (Cloudera to be more specific) that state that io.sort.mb should be 10 times io.sort.factor for terasort. Your java environment should have plenty of memory, but going beyond 2GB did not buy me a lot more performance. Keep in mind that the total number of slots (map + reduce) multiplied by this java memory reservation plus some memory (2GB?) to run your OS should fit into your VM memory reservation. mapred.compress.map.output sets compression in the shuffle phase: saved me a lot of time (and space).

There are some other, more exotic parameters you may want to play with (if you have the time), like mapred.job.reduce.input.buffer.percent, io.sort.record.percent or mapred.output.compression.codec, but with the above set you could come a long way.

I'm not sure you need to pre fill thick provisioned storage as the previous writer mentioned. I could see a need to pre fill thin provisioned storage, but then you might want to use thick provisioned in the first place.

Another switch you want to play with is IO shares in BDE itself, quite dramatic effect if you are running just one cluster.

Hope this helps.

I tried a Hadoop 2.x distribution on BDE a couple of times, but the terasort program in that particular distribution ran in some kind of Hadoop 1.x compatibility mode, and performance was, well, like uhm, complete crap.

Regards,

Tom

All

Terasort benchmarking with BDE 1.1