Hadoop集群性能测试
吴脂娟 2022/02/08
# 带有缓存磁盘的IO写速度 [root@cluster2 hadoop-mapreduce]# time dd if=/dev/zero of=test.dbf bs=8k count=300000 300000+0 records in 300000+0 records out 2457600000 bytes (2.5 GB) copied, 1.64114 s, 1.5 GB/s real 0m1.644s user 0m0.089s sys 0m1.553s # 实际磁盘的IO写速度 [root@cluster2 hadoop-mapreduce]# time dd if=/dev/zero of=test.dbf bs=8k count=300000 oflag=direct 300000+0 records in 300000+0 records out 2457600000 bytes (2.5 GB) copied, 24.2538 s, 101 MB/s real 0m24.586s user 0m0.143s sys 0m5.590s
[root@cluster2 hadoop-mapreduce]# dd if=test.dbf bs=8k count=300000 of=/dev/null 300000+0 records in 300000+0 records out 2457600000 bytes (2.5 GB) copied, 2.53032 s, 971 MB/s
[root@cluster2 hadoop-mapreduce]# time dd if=/dev/sda1 of=test.dbf bs=8k count=300000 128+0 records in 128+0 records out 1048576 bytes (1.0 MB) copied, 0.0083957 s, 125 MB/s real 0m0.219s user 0m0.000s sys 0m0.212s
利用hadoop自带基准测试工具包进行集群性能测试,测试平台为CDH6.3.2上hadoop3.0.0版本
# 测试内容:向HDFS集群写10、20、100个128M的文件 sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO \ -write \ -nrFiles 10 \ -size 128MB \ -resFile /tmp/TestDFSIO_results.log sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO \ -write \ -nrFiles 20 \ -size 128MB \ -resFile /tmp/TestDFSIO_results.log sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO \ -write \ -nrFiles 100 \ -size 128MB \ -resFile /tmp/TestDFSIO_results.log sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO \ -write \ -nrFiles 1000 \ -size 128MB \ -resFile /tmp/TestDFSIO_results.log #测试结果: [root@cluster2 hadoop-mapreduce]# cat /tmp/TestDFSIO_results.log ----- TestDFSIO ----- : write Date & time: Thu Feb 10 10:24:02 CST 2022 Number of files: 10 Total MBytes processed: 1280 Throughput mb/sec: 12.48 Average IO rate mb/sec: 67.14 IO rate std deviation: 87.56 Test exec time sec: 35.84 ----- TestDFSIO ----- : write Date & time: Thu Feb 10 10:25:25 CST 2022 Number of files: 20 Total MBytes processed: 2560 Throughput mb/sec: 10.74 Average IO rate mb/sec: 66.8 IO rate std deviation: 81.76 Test exec time sec: 46.41 ----- TestDFSIO ----- : write Date & time: Thu Feb 10 10:32:15 CST 2022 Number of files: 100 Total MBytes processed: 12800 Throughput mb/sec: 3.05 Average IO rate mb/sec: 48.24 IO rate std deviation: 66.78 Test exec time sec: 175.71 ----- TestDFSIO ----- : write Date & time: Thu Feb 10 15:30:14 CST 2022 Number of files: 1000 Total MBytes processed: 128000 Throughput mb/sec: 3.03 Average IO rate mb/sec: 62.9 IO rate std deviation: 91.22 Test exec time sec: 1455.23
#测试内容:读取HDFS集群10、20、100个128M的文件 sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO \ -read \ -nrFiles 10 \ -size 128MB \ -resFile /tmp/TestDFSIO_results.log sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO \ -read \ -nrFiles 20 \ -size 128MB \ -resFile /tmp/TestDFSIO_results.log sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO \ -read \ -nrFiles 100 \ -size 128MB \ -resFile /tmp/TestDFSIO_results.log sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO \ -read \ -nrFiles 1000 \ -size 128MB \ -resFile /tmp/TestDFSIO_results.log #测试结果: [root@cdh04 ~]# cat /tmp/TestDFSIO_results.log ----- TestDFSIO ----- : read Date & time: Thu Feb 10 10:35:34 CST 2022 Number of files: 10 Total MBytes processed: 1280 Throughput mb/sec: 84.21 Average IO rate mb/sec: 856.02 IO rate std deviation: 447.21 Test exec time sec: 22.54 ----- TestDFSIO ----- : read Date & time: Thu Feb 10 10:36:09 CST 2022 Number of files: 20 Total MBytes processed: 2560 Throughput mb/sec: 235.99 Average IO rate mb/sec: 721.61 IO rate std deviation: 278.21 Test exec time sec: 23.57 ----- TestDFSIO ----- : read Date & time: Thu Feb 10 10:38:54 CST 2022 Number of files: 100 Total MBytes processed: 12800 Throughput mb/sec: 44.66 Average IO rate mb/sec: 661.21 IO rate std deviation: 436.25 Test exec time sec: 133.27 ----- TestDFSIO ----- : read Date & time: Thu Feb 10 15:36:04 CST 2022 Number of files: 1000 Total MBytes processed: 128000 Throughput mb/sec: 41.93 Average IO rate mb/sec: 440.92 IO rate std deviation: 311.14 Test exec time sec: 297.88
#查看测试数据,数据默认保存在HDFS下,/benchmarks [root@cdh04 ~]# hadoop fs -du -h /benchmarks/TestDFSIO # 第一列是文件大小,第二列是HDFS默认备份数为3份的总大小 11.0 K 33.1 K /benchmarks/TestDFSIO/io_control 12.5 G 37.5 G /benchmarks/TestDFSIO/io_data 85 255 /benchmarks/TestDFSIO/io_read 83 249 /benchmarks/TestDFSIO/io_write #第一列是文件大小,第二列是HDFS默认备份数为3份的总大小 sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO -clean
# 12个mapper和2个reducer来创建1000个文件 sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar nnbench \ -operation create_write \ -maps 12 \ -reduces 2 \ -blockSize 1 \ -bytesToWrite 0 \ -numberOfFiles 1000 \ -replicationFactorPerFile 3 \ -readFileAfterOpen true \ -baseDir /benchmarks/NNBench-`hostname` 22/02/10 10:45:07 INFO hdfs.NNBench: -------------- NNBench -------------- : 22/02/10 10:45:07 INFO hdfs.NNBench: Version: NameNode Benchmark 0.4 22/02/10 10:45:07 INFO hdfs.NNBench: Date & time: 2022-02-10 10:45:07,665 22/02/10 10:45:07 INFO hdfs.NNBench: 22/02/10 10:45:07 INFO hdfs.NNBench: Test Operation: create_write 22/02/10 10:45:07 INFO hdfs.NNBench: Start time: 2022-02-10 10:44:55,886 22/02/10 10:45:07 INFO hdfs.NNBench: Maps to run: 12 22/02/10 10:45:07 INFO hdfs.NNBench: Reduces to run: 2 22/02/10 10:45:07 INFO hdfs.NNBench: Block Size (bytes): 1 22/02/10 10:45:07 INFO hdfs.NNBench: Bytes to write: 0 22/02/10 10:45:07 INFO hdfs.NNBench: Bytes per checksum: 1 22/02/10 10:45:07 INFO hdfs.NNBench: Number of files: 1000 22/02/10 10:45:07 INFO hdfs.NNBench: Replication factor: 3 22/02/10 10:45:07 INFO hdfs.NNBench: Successful file operations: 0 22/02/10 10:45:07 INFO hdfs.NNBench: 22/02/10 10:45:07 INFO hdfs.NNBench: # maps that missed the barrier: 0 22/02/10 10:45:07 INFO hdfs.NNBench: # exceptions: 12000 22/02/10 10:45:07 INFO hdfs.NNBench: 22/02/10 10:45:07 INFO hdfs.NNBench: TPS: Create/Write/Close: 0 22/02/10 10:45:07 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity 22/02/10 10:45:07 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN 22/02/10 10:45:07 INFO hdfs.NNBench: Avg Lat (ms): Close: NaN 22/02/10 10:45:07 INFO hdfs.NNBench: 22/02/10 10:45:07 INFO hdfs.NNBench: RAW DATA: AL Total #1: 0 22/02/10 10:45:07 INFO hdfs.NNBench: RAW DATA: AL Total #2: 0 22/02/10 10:45:07 INFO hdfs.NNBench: RAW DATA: TPS Total (ms): 52584 22/02/10 10:45:07 INFO hdfs.NNBench: RAW DATA: Longest Map Time (ms): 4511.0 22/02/10 10:45:07 INFO hdfs.NNBench: RAW DATA: Late maps: 0 22/02/10 10:45:07 INFO hdfs.NNBench: RAW DATA: # of exceptions: 12000 # 30个mapper和3个reducer来创建10000个文件 sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar nnbench \ -operation create_write \ -maps 30 \ -reduces 3 \ -blockSize 1 \ -bytesToWrite 0 \ -numberOfFiles 10000 \ -replicationFactorPerFile 3 \ -readFileAfterOpen true \ -baseDir /benchmarks/NNBench-`hostname` 22/02/10 10:48:23 INFO hdfs.NNBench: -------------- NNBench -------------- : 22/02/10 10:48:23 INFO hdfs.NNBench: Version: NameNode Benchmark 0.4 22/02/10 10:48:23 INFO hdfs.NNBench: Date & time: 2022-02-10 10:48:23,567 22/02/10 10:48:23 INFO hdfs.NNBench: 22/02/10 10:48:23 INFO hdfs.NNBench: Test Operation: create_write 22/02/10 10:48:23 INFO hdfs.NNBench: Start time: 2022-02-10 10:48:08,52 22/02/10 10:48:23 INFO hdfs.NNBench: Maps to run: 30 22/02/10 10:48:23 INFO hdfs.NNBench: Reduces to run: 3 22/02/10 10:48:23 INFO hdfs.NNBench: Block Size (bytes): 1 22/02/10 10:48:23 INFO hdfs.NNBench: Bytes to write: 0 22/02/10 10:48:23 INFO hdfs.NNBench: Bytes per checksum: 1 22/02/10 10:48:23 INFO hdfs.NNBench: Number of files: 10000 22/02/10 10:48:23 INFO hdfs.NNBench: Replication factor: 3 22/02/10 10:48:23 INFO hdfs.NNBench: Successful file operations: 0 22/02/10 10:48:23 INFO hdfs.NNBench: 22/02/10 10:48:23 INFO hdfs.NNBench: # maps that missed the barrier: 0 22/02/10 10:48:23 INFO hdfs.NNBench: # exceptions: 30000 22/02/10 10:48:23 INFO hdfs.NNBench: 22/02/10 10:48:23 INFO hdfs.NNBench: TPS: Create/Write/Close: 0 22/02/10 10:48:23 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity 22/02/10 10:48:23 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN 22/02/10 10:48:23 INFO hdfs.NNBench: Avg Lat (ms): Close: NaN 22/02/10 10:48:23 INFO hdfs.NNBench: 22/02/10 10:48:23 INFO hdfs.NNBench: RAW DATA: AL Total #1: 0 22/02/10 10:48:23 INFO hdfs.NNBench: RAW DATA: AL Total #2: 0 22/02/10 10:48:23 INFO hdfs.NNBench: RAW DATA: TPS Total (ms): 237510 22/02/10 10:48:23 INFO hdfs.NNBench: RAW DATA: Longest Map Time (ms): 8475.0 22/02/10 10:48:23 INFO hdfs.NNBench: RAW DATA: Late maps: 0 22/02/10 10:48:23 INFO hdfs.NNBench: RAW DATA: # of exceptions: 30000 # 100个mapper和50个reducer来创建10000个文件 sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar nnbench \ -operation create_write \ -maps 100 \ -reduces 50 \ -blockSize 1 \ -bytesToWrite 0 \ -numberOfFiles 10000 \ -replicationFactorPerFile 3 \ -readFileAfterOpen true \ -baseDir /benchmarks/NNBench-`hostname` 22/02/10 10:51:47 INFO hdfs.NNBench: -------------- NNBench -------------- : 22/02/10 10:51:47 INFO hdfs.NNBench: Version: NameNode Benchmark 0.4 22/02/10 10:51:47 INFO hdfs.NNBench: Date & time: 2022-02-10 10:51:47,246 22/02/10 10:51:47 INFO hdfs.NNBench: 22/02/10 10:51:47 INFO hdfs.NNBench: Test Operation: create_write 22/02/10 10:51:47 INFO hdfs.NNBench: Start time: 2022-02-10 10:51:13,989 22/02/10 10:51:47 INFO hdfs.NNBench: Maps to run: 100 22/02/10 10:51:47 INFO hdfs.NNBench: Reduces to run: 50 22/02/10 10:51:47 INFO hdfs.NNBench: Block Size (bytes): 1 22/02/10 10:51:47 INFO hdfs.NNBench: Bytes to write: 0 22/02/10 10:51:47 INFO hdfs.NNBench: Bytes per checksum: 1 22/02/10 10:51:47 INFO hdfs.NNBench: Number of files: 10000 22/02/10 10:51:47 INFO hdfs.NNBench: Replication factor: 3 22/02/10 10:51:47 INFO hdfs.NNBench: Successful file operations: 0 22/02/10 10:51:47 INFO hdfs.NNBench: 22/02/10 10:51:47 INFO hdfs.NNBench: # maps that missed the barrier: 61 22/02/10 10:51:47 INFO hdfs.NNBench: # exceptions: 39000 22/02/10 10:51:47 INFO hdfs.NNBench: 22/02/10 10:51:47 INFO hdfs.NNBench: TPS: Create/Write/Close: 0 22/02/10 10:51:47 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity 22/02/10 10:51:47 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN 22/02/10 10:51:47 INFO hdfs.NNBench: Avg Lat (ms): Close: NaN 22/02/10 10:51:47 INFO hdfs.NNBench: 22/02/10 10:51:47 INFO hdfs.NNBench: RAW DATA: AL Total #1: 0 22/02/10 10:51:47 INFO hdfs.NNBench: RAW DATA: AL Total #2: 0 22/02/10 10:51:47 INFO hdfs.NNBench: RAW DATA: TPS Total (ms): 319190 22/02/10 10:51:47 INFO hdfs.NNBench: RAW DATA: Longest Map Time (ms): 1.644461482876E12 22/02/10 10:51:47 INFO hdfs.NNBench: RAW DATA: Late maps: 61 22/02/10 10:51:47 INFO hdfs.NNBench: RAW DATA: # of exceptions: 39000
sudo -uhdfs hadoop fs -rm -r /benchmarks/NNBench-cluster0
mrbench多次重复执行一个小作业,检查在机群上小作业的运行是否可重复以及运行是否高效
#测试运行一个作业10次 sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ mrbench \ -numRuns 10 #测试结果: Map-Reduce Framework Map input records=1 Map output records=1 Map output bytes=3 Map output materialized bytes=37 Input split bytes=242 Combine input records=0 Combine output records=0 Reduce input groups=1 Reduce shuffle bytes=37 Reduce input records=1 Reduce output records=1 Spilled Records=2 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=142 CPU time spent (ms)=2110 Physical memory (bytes) snapshot=1510125568 Virtual memory (bytes) snapshot=7833825280 Total committed heap usage (bytes)=1857552384 Peak Map Physical memory (bytes)=591335424 Peak Map Virtual memory (bytes)=2607177728 Peak Reduce Physical memory (bytes)=329736192 Peak Reduce Virtual memory (bytes)=2620637184 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=3 File Output Format Counters Bytes Written=3 DataLines Maps Reduces AvgTime (milliseconds) 1 2 1 17541 #清除数据 sudo -uhdfs hadoop fs -rm -r /benchmarks/MRBench
TeraGen生成随机数:
#先生成测试数据1G 到 /tmp/examples/terasort-input 目录下 sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \ teragen 10000000 /tmp/examples/terasort-input [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h /tmp/examples/ 953.7 M 2.8 G /tmp/examples/terasort-input [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h /tmp/examples/terasort-input 0 0 /tmp/examples/terasort-input/_SUCCESS 476.8 M 1.4 G /tmp/examples/terasort-input/part-m-00000 476.8 M 1.4 G /tmp/examples/terasort-input/part-m-00001
TeraSort排序:
#默认reduce个数1个,自定义为90个。读取/tmp/examples/terasort-intput,将结果输出到目录 /tmp/examples/terasort-output sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \ terasort /tmp/examples/terasort-input /tmp/examples/terasort-output [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h /tmp/examples 953.7 M 2.8 G /tmp/examples/terasort-input 953.7 M 953.7 M /tmp/examples/terasort-output [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h /tmp/examples/terasort-output 0 0 /tmp/examples/terasort-output/_SUCCESS 209 2.0 K /tmp/examples/terasort-output/_partition.lst 46.9 M 46.9 M /tmp/examples/terasort-output/part-r-00000 48.5 M 48.5 M /tmp/examples/terasort-output/part-r-00001 47.7 M 47.7 M /tmp/examples/terasort-output/part-r-00002 47.2 M 47.2 M /tmp/examples/terasort-output/part-r-00003 48.5 M 48.5 M /tmp/examples/terasort-output/part-r-00004 47.7 M 47.7 M /tmp/examples/terasort-output/part-r-00005 47.9 M 47.9 M /tmp/examples/terasort-output/part-r-00006 48.6 M 48.6 M /tmp/examples/terasort-output/part-r-00007 47.0 M 47.0 M /tmp/examples/terasort-output/part-r-00008 47.2 M 47.2 M /tmp/examples/terasort-output/part-r-00009 46.5 M 46.5 M /tmp/examples/terasort-output/part-r-00010 47.3 M 47.3 M /tmp/examples/terasort-output/part-r-00011 47.8 M 47.8 M /tmp/examples/terasort-output/part-r-00012 47.0 M 47.0 M /tmp/examples/terasort-output/part-r-00013 48.8 M 48.8 M /tmp/examples/terasort-output/part-r-00014 48.2 M 48.2 M /tmp/examples/terasort-output/part-r-00015 47.7 M 47.7 M /tmp/examples/terasort-output/part-r-00016 47.5 M 47.5 M /tmp/examples/terasort-output/part-r-00017 48.4 M 48.4 M /tmp/examples/terasort-output/part-r-00018 47.2 M 47.2 M /tmp/examples/terasort-output/part-r-00019
TeraValidate验证:
#如果检测到问题,将乱序的key输出到目录/tmp/examples/terasort-validate sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \ teravalidate /tmp/examples/terasort-output /tmp/examples/terasort-validate [root@cluster3 ~]# sudo -uhdfs hadoop fs -du -h /tmp/examples 953.7 M 2.8 G /tmp/examples/terasort-input 953.7 M 953.7 M /tmp/examples/terasort-output 24 72 /tmp/examples/terasort-validate [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h /tmp/examples/terasort-validate 0 0 /tmp/examples/terasort-validate/_SUCCESS 24 72 /tmp/examples/terasort-validate/part-r-00000 [root@cluster2 ~]# sudo -uhdfs hadoop fs -cat /tmp/examples/terasort-validate/part-r-00000 checksum 4c49607ac53602 # 验证说明数据是有序的,Mapreduce 排序正常没有问题
清除数据
sudo -uhdfs hadoop fs -rm -r /tmp/examples