Hadoop集群性能测试
吴脂娟 2022/02/08

测试报告列表
测试报告列表

操作系统磁盘IO测试

hadoop自带的性能基准评测工具

利用hadoop自带基准测试工具包进行集群性能测试,测试平台为CDH6.3.2上hadoop3.0.0版本

TestDFSIO:HDFS的IO性能测试

  1. 测试HDFS写性能:
    # 测试内容:向HDFS集群写10、20、100个128M的文件
    sudo -uhdfs hadoop jar \
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
    TestDFSIO \
    -write \
    -nrFiles 10 \
    -size 128MB \
    -resFile /tmp/TestDFSIO_results.log
    
    sudo -uhdfs hadoop jar \
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
    TestDFSIO \
    -write \
    -nrFiles 20 \
    -size 128MB \
    -resFile /tmp/TestDFSIO_results.log
    
    sudo -uhdfs hadoop jar \
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
    TestDFSIO \
    -write \
    -nrFiles 100 \
    -size 128MB \
    -resFile /tmp/TestDFSIO_results.log
    
    sudo -uhdfs hadoop jar \
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
    TestDFSIO \
    -write \
    -nrFiles 1000 \
    -size 128MB \
    -resFile /tmp/TestDFSIO_results.log
    
    #测试结果:
    [root@cluster2 hadoop-mapreduce]# cat /tmp/TestDFSIO_results.log
    ----- TestDFSIO ----- : write
                Date & time: Thu Feb 10 10:24:02 CST 2022
            Number of files: 10
    Total MBytes processed: 1280
          Throughput mb/sec: 12.48
    Average IO rate mb/sec: 67.14
      IO rate std deviation: 87.56
        Test exec time sec: 35.84
    
    ----- TestDFSIO ----- : write
                Date & time: Thu Feb 10 10:25:25 CST 2022
            Number of files: 20
    Total MBytes processed: 2560
          Throughput mb/sec: 10.74
    Average IO rate mb/sec: 66.8
      IO rate std deviation: 81.76
        Test exec time sec: 46.41
    
    ----- TestDFSIO ----- : write
                Date & time: Thu Feb 10 10:32:15 CST 2022
            Number of files: 100
    Total MBytes processed: 12800
          Throughput mb/sec: 3.05
    Average IO rate mb/sec: 48.24
      IO rate std deviation: 66.78
        Test exec time sec: 175.71
    
    ----- TestDFSIO ----- : write
            Date & time: Thu Feb 10 15:30:14 CST 2022
        Number of files: 1000
    Total MBytes processed: 128000
          Throughput mb/sec: 3.03
    Average IO rate mb/sec: 62.9
      IO rate std deviation: 91.22
        Test exec time sec: 1455.23
    
  2. 测试HDFS读性能:
    #测试内容:读取HDFS集群10、20、100个128M的文件
    sudo -uhdfs hadoop jar \
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
    TestDFSIO \
    -read \
    -nrFiles 10 \
    -size 128MB \
    -resFile /tmp/TestDFSIO_results.log
    
    sudo -uhdfs hadoop jar \
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
    TestDFSIO \
    -read \
    -nrFiles 20 \
    -size 128MB \
    -resFile /tmp/TestDFSIO_results.log
    
    sudo -uhdfs hadoop jar \
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
    TestDFSIO \
    -read \
    -nrFiles 100 \
    -size 128MB \
    -resFile /tmp/TestDFSIO_results.log
    
    sudo -uhdfs hadoop jar \
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
    TestDFSIO \
    -read \
    -nrFiles 1000 \
    -size 128MB \
    -resFile /tmp/TestDFSIO_results.log
    
    #测试结果:
    [root@cdh04 ~]# cat /tmp/TestDFSIO_results.log
    ----- TestDFSIO ----- : read
                Date & time: Thu Feb 10 10:35:34 CST 2022
            Number of files: 10
    Total MBytes processed: 1280
          Throughput mb/sec: 84.21
    Average IO rate mb/sec: 856.02
      IO rate std deviation: 447.21
        Test exec time sec: 22.54
    
    ----- TestDFSIO ----- : read
                Date & time: Thu Feb 10 10:36:09 CST 2022
            Number of files: 20
    Total MBytes processed: 2560
          Throughput mb/sec: 235.99
    Average IO rate mb/sec: 721.61
      IO rate std deviation: 278.21
        Test exec time sec: 23.57
    
    ----- TestDFSIO ----- : read
                Date & time: Thu Feb 10 10:38:54 CST 2022
            Number of files: 100
    Total MBytes processed: 12800
          Throughput mb/sec: 44.66
    Average IO rate mb/sec: 661.21
      IO rate std deviation: 436.25
        Test exec time sec: 133.27
    
    ----- TestDFSIO ----- : read
               Date & time: Thu Feb 10 15:36:04 CST 2022
           Number of files: 1000
    Total MBytes processed: 128000
          Throughput mb/sec: 41.93
    Average IO rate mb/sec: 440.92
      IO rate std deviation: 311.14
        Test exec time sec: 297.88
    
    
  3. 清除测试数据:
     #查看测试数据,数据默认保存在HDFS下,/benchmarks
     [root@cdh04 ~]# hadoop fs -du -h /benchmarks/TestDFSIO
     # 第一列是文件大小,第二列是HDFS默认备份数为3份的总大小
     11.0 K  33.1 K  /benchmarks/TestDFSIO/io_control
     12.5 G  37.5 G  /benchmarks/TestDFSIO/io_data
     85      255     /benchmarks/TestDFSIO/io_read
     83      249     /benchmarks/TestDFSIO/io_write
    
     #第一列是文件大小,第二列是HDFS默认备份数为3份的总大小
     sudo -uhdfs hadoop jar \
     /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
     TestDFSIO -clean
    

nnbench:Namenode压力测试

  1. 测试NameNode的负载:
    它会生成很多与HDFS相关的请求,给NameNode施加较大的压力。这个测试能在HDFS上模拟创建、读取、重命名和删除文件等操作
    # 12个mapper和2个reducer来创建1000个文件
    sudo -uhdfs hadoop jar \
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar nnbench \
    -operation create_write \
    -maps 12 \
    -reduces 2 \
    -blockSize 1 \
    -bytesToWrite 0 \
    -numberOfFiles 1000 \
    -replicationFactorPerFile 3 \
    -readFileAfterOpen true \
    -baseDir /benchmarks/NNBench-`hostname`
    
    22/02/10 10:45:07 INFO hdfs.NNBench: -------------- NNBench -------------- : 
    22/02/10 10:45:07 INFO hdfs.NNBench:                                Version: NameNode Benchmark 0.4
    22/02/10 10:45:07 INFO hdfs.NNBench:                            Date & time: 2022-02-10 10:45:07,665
    22/02/10 10:45:07 INFO hdfs.NNBench: 
    22/02/10 10:45:07 INFO hdfs.NNBench:                         Test Operation: create_write
    22/02/10 10:45:07 INFO hdfs.NNBench:                             Start time: 2022-02-10 10:44:55,886
    22/02/10 10:45:07 INFO hdfs.NNBench:                            Maps to run: 12
    22/02/10 10:45:07 INFO hdfs.NNBench:                         Reduces to run: 2
    22/02/10 10:45:07 INFO hdfs.NNBench:                     Block Size (bytes): 1
    22/02/10 10:45:07 INFO hdfs.NNBench:                         Bytes to write: 0
    22/02/10 10:45:07 INFO hdfs.NNBench:                     Bytes per checksum: 1
    22/02/10 10:45:07 INFO hdfs.NNBench:                        Number of files: 1000
    22/02/10 10:45:07 INFO hdfs.NNBench:                     Replication factor: 3
    22/02/10 10:45:07 INFO hdfs.NNBench:             Successful file operations: 0
    22/02/10 10:45:07 INFO hdfs.NNBench: 
    22/02/10 10:45:07 INFO hdfs.NNBench:         # maps that missed the barrier: 0
    22/02/10 10:45:07 INFO hdfs.NNBench:                           # exceptions: 12000
    22/02/10 10:45:07 INFO hdfs.NNBench: 
    22/02/10 10:45:07 INFO hdfs.NNBench:                TPS: Create/Write/Close: 0
    22/02/10 10:45:07 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity
    22/02/10 10:45:07 INFO hdfs.NNBench:             Avg Lat (ms): Create/Write: NaN
    22/02/10 10:45:07 INFO hdfs.NNBench:                    Avg Lat (ms): Close: NaN
    22/02/10 10:45:07 INFO hdfs.NNBench: 
    22/02/10 10:45:07 INFO hdfs.NNBench:                  RAW DATA: AL Total #1: 0
    22/02/10 10:45:07 INFO hdfs.NNBench:                  RAW DATA: AL Total #2: 0
    22/02/10 10:45:07 INFO hdfs.NNBench:               RAW DATA: TPS Total (ms): 52584
    22/02/10 10:45:07 INFO hdfs.NNBench:        RAW DATA: Longest Map Time (ms): 4511.0
    22/02/10 10:45:07 INFO hdfs.NNBench:                    RAW DATA: Late maps: 0
    22/02/10 10:45:07 INFO hdfs.NNBench:              RAW DATA: # of exceptions: 12000
    
    # 30个mapper和3个reducer来创建10000个文件 
    sudo -uhdfs hadoop jar \
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar nnbench \
    -operation create_write \
    -maps 30 \
    -reduces 3 \
    -blockSize 1 \
    -bytesToWrite 0 \
    -numberOfFiles 10000 \
    -replicationFactorPerFile 3 \
    -readFileAfterOpen true \
    -baseDir /benchmarks/NNBench-`hostname`
    
    22/02/10 10:48:23 INFO hdfs.NNBench: -------------- NNBench -------------- : 
    22/02/10 10:48:23 INFO hdfs.NNBench:                                Version: NameNode Benchmark 0.4
    22/02/10 10:48:23 INFO hdfs.NNBench:                            Date & time: 2022-02-10 10:48:23,567
    22/02/10 10:48:23 INFO hdfs.NNBench: 
    22/02/10 10:48:23 INFO hdfs.NNBench:                         Test Operation: create_write
    22/02/10 10:48:23 INFO hdfs.NNBench:                             Start time: 2022-02-10 10:48:08,52
    22/02/10 10:48:23 INFO hdfs.NNBench:                            Maps to run: 30
    22/02/10 10:48:23 INFO hdfs.NNBench:                         Reduces to run: 3
    22/02/10 10:48:23 INFO hdfs.NNBench:                     Block Size (bytes): 1
    22/02/10 10:48:23 INFO hdfs.NNBench:                         Bytes to write: 0
    22/02/10 10:48:23 INFO hdfs.NNBench:                     Bytes per checksum: 1
    22/02/10 10:48:23 INFO hdfs.NNBench:                        Number of files: 10000
    22/02/10 10:48:23 INFO hdfs.NNBench:                     Replication factor: 3
    22/02/10 10:48:23 INFO hdfs.NNBench:             Successful file operations: 0
    22/02/10 10:48:23 INFO hdfs.NNBench: 
    22/02/10 10:48:23 INFO hdfs.NNBench:         # maps that missed the barrier: 0
    22/02/10 10:48:23 INFO hdfs.NNBench:                           # exceptions: 30000
    22/02/10 10:48:23 INFO hdfs.NNBench: 
    22/02/10 10:48:23 INFO hdfs.NNBench:                TPS: Create/Write/Close: 0
    22/02/10 10:48:23 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity
    22/02/10 10:48:23 INFO hdfs.NNBench:             Avg Lat (ms): Create/Write: NaN
    22/02/10 10:48:23 INFO hdfs.NNBench:                    Avg Lat (ms): Close: NaN
    22/02/10 10:48:23 INFO hdfs.NNBench: 
    22/02/10 10:48:23 INFO hdfs.NNBench:                  RAW DATA: AL Total #1: 0
    22/02/10 10:48:23 INFO hdfs.NNBench:                  RAW DATA: AL Total #2: 0
    22/02/10 10:48:23 INFO hdfs.NNBench:               RAW DATA: TPS Total (ms): 237510
    22/02/10 10:48:23 INFO hdfs.NNBench:        RAW DATA: Longest Map Time (ms): 8475.0
    22/02/10 10:48:23 INFO hdfs.NNBench:                    RAW DATA: Late maps: 0
    22/02/10 10:48:23 INFO hdfs.NNBench:              RAW DATA: # of exceptions: 30000
    
    # 100个mapper和50个reducer来创建10000个文件 
    sudo -uhdfs hadoop jar \
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar nnbench \
    -operation create_write \
    -maps 100 \
    -reduces 50 \
    -blockSize 1 \
    -bytesToWrite 0 \
    -numberOfFiles 10000 \
    -replicationFactorPerFile 3 \
    -readFileAfterOpen true \
    -baseDir /benchmarks/NNBench-`hostname`
    
    22/02/10 10:51:47 INFO hdfs.NNBench: -------------- NNBench -------------- : 
    22/02/10 10:51:47 INFO hdfs.NNBench:                                Version: NameNode Benchmark 0.4
    22/02/10 10:51:47 INFO hdfs.NNBench:                            Date & time: 2022-02-10 10:51:47,246
    22/02/10 10:51:47 INFO hdfs.NNBench: 
    22/02/10 10:51:47 INFO hdfs.NNBench:                         Test Operation: create_write
    22/02/10 10:51:47 INFO hdfs.NNBench:                             Start time: 2022-02-10 10:51:13,989
    22/02/10 10:51:47 INFO hdfs.NNBench:                            Maps to run: 100
    22/02/10 10:51:47 INFO hdfs.NNBench:                         Reduces to run: 50
    22/02/10 10:51:47 INFO hdfs.NNBench:                     Block Size (bytes): 1
    22/02/10 10:51:47 INFO hdfs.NNBench:                         Bytes to write: 0
    22/02/10 10:51:47 INFO hdfs.NNBench:                     Bytes per checksum: 1
    22/02/10 10:51:47 INFO hdfs.NNBench:                        Number of files: 10000
    22/02/10 10:51:47 INFO hdfs.NNBench:                     Replication factor: 3
    22/02/10 10:51:47 INFO hdfs.NNBench:             Successful file operations: 0
    22/02/10 10:51:47 INFO hdfs.NNBench: 
    22/02/10 10:51:47 INFO hdfs.NNBench:         # maps that missed the barrier: 61
    22/02/10 10:51:47 INFO hdfs.NNBench:                           # exceptions: 39000
    22/02/10 10:51:47 INFO hdfs.NNBench: 
    22/02/10 10:51:47 INFO hdfs.NNBench:                TPS: Create/Write/Close: 0
    22/02/10 10:51:47 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity
    22/02/10 10:51:47 INFO hdfs.NNBench:             Avg Lat (ms): Create/Write: NaN
    22/02/10 10:51:47 INFO hdfs.NNBench:                    Avg Lat (ms): Close: NaN
    22/02/10 10:51:47 INFO hdfs.NNBench: 
    22/02/10 10:51:47 INFO hdfs.NNBench:                  RAW DATA: AL Total #1: 0
    22/02/10 10:51:47 INFO hdfs.NNBench:                  RAW DATA: AL Total #2: 0
    22/02/10 10:51:47 INFO hdfs.NNBench:               RAW DATA: TPS Total (ms): 319190
    22/02/10 10:51:47 INFO hdfs.NNBench:        RAW DATA: Longest Map Time (ms): 1.644461482876E12
    22/02/10 10:51:47 INFO hdfs.NNBench:                    RAW DATA: Late maps: 61
    22/02/10 10:51:47 INFO hdfs.NNBench:              RAW DATA: # of exceptions: 39000
    
    
  2. 清除测试数据:
     sudo -uhdfs hadoop fs -rm -r /benchmarks/NNBench-cluster0
    

mrbench:MapReduce程序测试

mrbench多次重复执行一个小作业,检查在机群上小作业的运行是否可重复以及运行是否高效

#测试运行一个作业10次
sudo -uhdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
mrbench \
-numRuns 10

#测试结果:
	Map-Reduce Framework
		Map input records=1
		Map output records=1
		Map output bytes=3
		Map output materialized bytes=37
		Input split bytes=242
		Combine input records=0
		Combine output records=0
		Reduce input groups=1
		Reduce shuffle bytes=37
		Reduce input records=1
		Reduce output records=1
		Spilled Records=2
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=142
		CPU time spent (ms)=2110
		Physical memory (bytes) snapshot=1510125568
		Virtual memory (bytes) snapshot=7833825280
		Total committed heap usage (bytes)=1857552384
		Peak Map Physical memory (bytes)=591335424
		Peak Map Virtual memory (bytes)=2607177728
		Peak Reduce Physical memory (bytes)=329736192
		Peak Reduce Virtual memory (bytes)=2620637184
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=3
	File Output Format Counters 
		Bytes Written=3
DataLines	Maps	Reduces	AvgTime (milliseconds)
1		2	1	17541

#清除数据
sudo -uhdfs hadoop fs -rm -r /benchmarks/MRBench

TeraSort:Mapreduce 排序测试

  1. TeraGen生成随机数:

     #先生成测试数据1G 到 /tmp/examples/terasort-input 目录下
     sudo -uhdfs hadoop jar \
     /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
     teragen 10000000 /tmp/examples/terasort-input
    
     [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h /tmp/examples/
     953.7 M  2.8 G  /tmp/examples/terasort-input
       
     [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h  /tmp/examples/terasort-input
     0        0      /tmp/examples/terasort-input/_SUCCESS
     476.8 M  1.4 G  /tmp/examples/terasort-input/part-m-00000
     476.8 M  1.4 G  /tmp/examples/terasort-input/part-m-00001
    
  2. TeraSort排序:

    #默认reduce个数1个,自定义为90个。读取/tmp/examples/terasort-intput,将结果输出到目录 /tmp/examples/terasort-output
     sudo -uhdfs hadoop jar \
     /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
     terasort /tmp/examples/terasort-input /tmp/examples/terasort-output
    
     [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h  /tmp/examples
     953.7 M  2.8 G    /tmp/examples/terasort-input
     953.7 M  953.7 M  /tmp/examples/terasort-output
    
     [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h  /tmp/examples/terasort-output
     0       0       /tmp/examples/terasort-output/_SUCCESS
     209     2.0 K   /tmp/examples/terasort-output/_partition.lst
     46.9 M  46.9 M  /tmp/examples/terasort-output/part-r-00000
     48.5 M  48.5 M  /tmp/examples/terasort-output/part-r-00001
     47.7 M  47.7 M  /tmp/examples/terasort-output/part-r-00002
     47.2 M  47.2 M  /tmp/examples/terasort-output/part-r-00003
     48.5 M  48.5 M  /tmp/examples/terasort-output/part-r-00004
     47.7 M  47.7 M  /tmp/examples/terasort-output/part-r-00005
     47.9 M  47.9 M  /tmp/examples/terasort-output/part-r-00006
     48.6 M  48.6 M  /tmp/examples/terasort-output/part-r-00007
     47.0 M  47.0 M  /tmp/examples/terasort-output/part-r-00008
     47.2 M  47.2 M  /tmp/examples/terasort-output/part-r-00009
     46.5 M  46.5 M  /tmp/examples/terasort-output/part-r-00010
     47.3 M  47.3 M  /tmp/examples/terasort-output/part-r-00011
     47.8 M  47.8 M  /tmp/examples/terasort-output/part-r-00012
     47.0 M  47.0 M  /tmp/examples/terasort-output/part-r-00013
     48.8 M  48.8 M  /tmp/examples/terasort-output/part-r-00014
     48.2 M  48.2 M  /tmp/examples/terasort-output/part-r-00015
     47.7 M  47.7 M  /tmp/examples/terasort-output/part-r-00016
     47.5 M  47.5 M  /tmp/examples/terasort-output/part-r-00017
     48.4 M  48.4 M  /tmp/examples/terasort-output/part-r-00018
     47.2 M  47.2 M  /tmp/examples/terasort-output/part-r-00019
    
  3. TeraValidate验证:

    #如果检测到问题,将乱序的key输出到目录/tmp/examples/terasort-validate
     sudo -uhdfs hadoop jar \
     /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
     teravalidate  /tmp/examples/terasort-output /tmp/examples/terasort-validate
     
     [root@cluster3 ~]# sudo -uhdfs hadoop fs -du -h  /tmp/examples
     953.7 M  2.8 G    /tmp/examples/terasort-input
     953.7 M  953.7 M  /tmp/examples/terasort-output
     24       72       /tmp/examples/terasort-validate
    
    
     [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h  /tmp/examples/terasort-validate
     0   0   /tmp/examples/terasort-validate/_SUCCESS
     24  72  /tmp/examples/terasort-validate/part-r-00000
     
     [root@cluster2 ~]# sudo -uhdfs hadoop fs -cat /tmp/examples/terasort-validate/part-r-00000
     checksum	4c49607ac53602
     # 验证说明数据是有序的,Mapreduce 排序正常没有问题
    
    
  4. 清除数据

     sudo -uhdfs hadoop fs -rm -r /tmp/examples