Hadoop集群性能测试
吴脂娟 2022/02/08

操作系统磁盘IO测试

测试磁盘的IO写速度：

# 带有缓存磁盘的IO写速度
[root@cluster2 hadoop-mapreduce]# time dd if=/dev/zero of=test.dbf bs=8k count=300000
300000+0 records in
300000+0 records out
2457600000 bytes (2.5 GB) copied, 1.64114 s, 1.5 GB/s

real	0m1.644s
user	0m0.089s
sys	0m1.553s

# 实际磁盘的IO写速度
[root@cluster2 hadoop-mapreduce]#  time dd if=/dev/zero of=test.dbf bs=8k count=300000 oflag=direct
300000+0 records in
300000+0 records out
2457600000 bytes (2.5 GB) copied, 24.2538 s, 101 MB/s

real	0m24.586s
user	0m0.143s
sys	0m5.590s

测试磁盘的IO读速度：

[root@cluster2 hadoop-mapreduce]# dd if=test.dbf bs=8k count=300000 of=/dev/null
300000+0 records in
300000+0 records out
2457600000 bytes (2.5 GB) copied, 2.53032 s, 971 MB/s

测试磁盘的IO同时读和写的速度:

[root@cluster2 hadoop-mapreduce]# time dd if=/dev/sda1 of=test.dbf bs=8k count=300000
128+0 records in
128+0 records out
1048576 bytes (1.0 MB) copied, 0.0083957 s, 125 MB/s

real	0m0.219s
user	0m0.000s
sys	0m0.212s

hadoop自带的性能基准评测工具

利用hadoop自带基准测试工具包进行集群性能测试，测试平台为CDH6.3.2上hadoop3.0.0版本

TestDFSIO：HDFS的IO性能测试

测试HDFS写性能：

# 测试内容：向HDFS集群写10、20、100个128M的文件
sudo -uhdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
TestDFSIO \
-write \
-nrFiles 10 \
-size 128MB \
-resFile /tmp/TestDFSIO_results.log

sudo -uhdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
TestDFSIO \
-write \
-nrFiles 20 \
-size 128MB \
-resFile /tmp/TestDFSIO_results.log

sudo -uhdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
TestDFSIO \
-write \
-nrFiles 100 \
-size 128MB \
-resFile /tmp/TestDFSIO_results.log

sudo -uhdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
TestDFSIO \
-write \
-nrFiles 1000 \
-size 128MB \
-resFile /tmp/TestDFSIO_results.log

#测试结果:
[root@cluster2 hadoop-mapreduce]# cat /tmp/TestDFSIO_results.log
----- TestDFSIO ----- : write
            Date & time: Thu Feb 10 10:24:02 CST 2022
        Number of files: 10
Total MBytes processed: 1280
      Throughput mb/sec: 12.48
Average IO rate mb/sec: 67.14
  IO rate std deviation: 87.56
    Test exec time sec: 35.84

----- TestDFSIO ----- : write
            Date & time: Thu Feb 10 10:25:25 CST 2022
        Number of files: 20
Total MBytes processed: 2560
      Throughput mb/sec: 10.74
Average IO rate mb/sec: 66.8
  IO rate std deviation: 81.76
    Test exec time sec: 46.41

----- TestDFSIO ----- : write
            Date & time: Thu Feb 10 10:32:15 CST 2022
        Number of files: 100
Total MBytes processed: 12800
      Throughput mb/sec: 3.05
Average IO rate mb/sec: 48.24
  IO rate std deviation: 66.78
    Test exec time sec: 175.71

----- TestDFSIO ----- : write
        Date & time: Thu Feb 10 15:30:14 CST 2022
    Number of files: 1000
Total MBytes processed: 128000
      Throughput mb/sec: 3.03
Average IO rate mb/sec: 62.9
  IO rate std deviation: 91.22
    Test exec time sec: 1455.23

测试HDFS读性能：

#测试内容：读取HDFS集群10、20、100个128M的文件
sudo -uhdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
TestDFSIO \
-read \
-nrFiles 10 \
-size 128MB \
-resFile /tmp/TestDFSIO_results.log

sudo -uhdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
TestDFSIO \
-read \
-nrFiles 20 \
-size 128MB \
-resFile /tmp/TestDFSIO_results.log

sudo -uhdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
TestDFSIO \
-read \
-nrFiles 100 \
-size 128MB \
-resFile /tmp/TestDFSIO_results.log

sudo -uhdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
TestDFSIO \
-read \
-nrFiles 1000 \
-size 128MB \
-resFile /tmp/TestDFSIO_results.log

#测试结果：
[root@cdh04 ~]# cat /tmp/TestDFSIO_results.log
----- TestDFSIO ----- : read
            Date & time: Thu Feb 10 10:35:34 CST 2022
        Number of files: 10
Total MBytes processed: 1280
      Throughput mb/sec: 84.21
Average IO rate mb/sec: 856.02
  IO rate std deviation: 447.21
    Test exec time sec: 22.54

----- TestDFSIO ----- : read
            Date & time: Thu Feb 10 10:36:09 CST 2022
        Number of files: 20
Total MBytes processed: 2560
      Throughput mb/sec: 235.99
Average IO rate mb/sec: 721.61
  IO rate std deviation: 278.21
    Test exec time sec: 23.57

----- TestDFSIO ----- : read
            Date & time: Thu Feb 10 10:38:54 CST 2022
        Number of files: 100
Total MBytes processed: 12800
      Throughput mb/sec: 44.66
Average IO rate mb/sec: 661.21
  IO rate std deviation: 436.25
    Test exec time sec: 133.27

----- TestDFSIO ----- : read
           Date & time: Thu Feb 10 15:36:04 CST 2022
       Number of files: 1000
Total MBytes processed: 128000
      Throughput mb/sec: 41.93
Average IO rate mb/sec: 440.92
  IO rate std deviation: 311.14
    Test exec time sec: 297.88

清除测试数据:

 #查看测试数据，数据默认保存在HDFS下，/benchmarks
 [root@cdh04 ~]# hadoop fs -du -h /benchmarks/TestDFSIO
 # 第一列是文件大小，第二列是HDFS默认备份数为3份的总大小
 11.0 K  33.1 K  /benchmarks/TestDFSIO/io_control
 12.5 G  37.5 G  /benchmarks/TestDFSIO/io_data
 85      255     /benchmarks/TestDFSIO/io_read
 83      249     /benchmarks/TestDFSIO/io_write

 #第一列是文件大小，第二列是HDFS默认备份数为3份的总大小
 sudo -uhdfs hadoop jar \
 /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
 TestDFSIO -clean

nnbench：Namenode压力测试

测试NameNode的负载：
它会生成很多与HDFS相关的请求，给NameNode施加较大的压力。这个测试能在HDFS上模拟创建、读取、重命名和删除文件等操作

# 12个mapper和2个reducer来创建1000个文件
sudo -uhdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar nnbench \
-operation create_write \
-maps 12 \
-reduces 2 \
-blockSize 1 \
-bytesToWrite 0 \
-numberOfFiles 1000 \
-replicationFactorPerFile 3 \
-readFileAfterOpen true \
-baseDir /benchmarks/NNBench-`hostname`

22/02/10 10:45:07 INFO hdfs.NNBench: -------------- NNBench -------------- : 
22/02/10 10:45:07 INFO hdfs.NNBench:                                Version: NameNode Benchmark 0.4
22/02/10 10:45:07 INFO hdfs.NNBench:                            Date & time: 2022-02-10 10:45:07,665
22/02/10 10:45:07 INFO hdfs.NNBench: 
22/02/10 10:45:07 INFO hdfs.NNBench:                         Test Operation: create_write
22/02/10 10:45:07 INFO hdfs.NNBench:                             Start time: 2022-02-10 10:44:55,886
22/02/10 10:45:07 INFO hdfs.NNBench:                            Maps to run: 12
22/02/10 10:45:07 INFO hdfs.NNBench:                         Reduces to run: 2
22/02/10 10:45:07 INFO hdfs.NNBench:                     Block Size (bytes): 1
22/02/10 10:45:07 INFO hdfs.NNBench:                         Bytes to write: 0
22/02/10 10:45:07 INFO hdfs.NNBench:                     Bytes per checksum: 1
22/02/10 10:45:07 INFO hdfs.NNBench:                        Number of files: 1000
22/02/10 10:45:07 INFO hdfs.NNBench:                     Replication factor: 3
22/02/10 10:45:07 INFO hdfs.NNBench:             Successful file operations: 0
22/02/10 10:45:07 INFO hdfs.NNBench: 
22/02/10 10:45:07 INFO hdfs.NNBench:         # maps that missed the barrier: 0
22/02/10 10:45:07 INFO hdfs.NNBench:                           # exceptions: 12000
22/02/10 10:45:07 INFO hdfs.NNBench: 
22/02/10 10:45:07 INFO hdfs.NNBench:                TPS: Create/Write/Close: 0
22/02/10 10:45:07 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity
22/02/10 10:45:07 INFO hdfs.NNBench:             Avg Lat (ms): Create/Write: NaN
22/02/10 10:45:07 INFO hdfs.NNBench:                    Avg Lat (ms): Close: NaN
22/02/10 10:45:07 INFO hdfs.NNBench: 
22/02/10 10:45:07 INFO hdfs.NNBench:                  RAW DATA: AL Total #1: 0
22/02/10 10:45:07 INFO hdfs.NNBench:                  RAW DATA: AL Total #2: 0
22/02/10 10:45:07 INFO hdfs.NNBench:               RAW DATA: TPS Total (ms): 52584
22/02/10 10:45:07 INFO hdfs.NNBench:        RAW DATA: Longest Map Time (ms): 4511.0
22/02/10 10:45:07 INFO hdfs.NNBench:                    RAW DATA: Late maps: 0
22/02/10 10:45:07 INFO hdfs.NNBench:              RAW DATA: # of exceptions: 12000

# 30个mapper和3个reducer来创建10000个文件 
sudo -uhdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar nnbench \
-operation create_write \
-maps 30 \
-reduces 3 \
-blockSize 1 \
-bytesToWrite 0 \
-numberOfFiles 10000 \
-replicationFactorPerFile 3 \
-readFileAfterOpen true \
-baseDir /benchmarks/NNBench-`hostname`

22/02/10 10:48:23 INFO hdfs.NNBench: -------------- NNBench -------------- : 
22/02/10 10:48:23 INFO hdfs.NNBench:                                Version: NameNode Benchmark 0.4
22/02/10 10:48:23 INFO hdfs.NNBench:                            Date & time: 2022-02-10 10:48:23,567
22/02/10 10:48:23 INFO hdfs.NNBench: 
22/02/10 10:48:23 INFO hdfs.NNBench:                         Test Operation: create_write
22/02/10 10:48:23 INFO hdfs.NNBench:                             Start time: 2022-02-10 10:48:08,52
22/02/10 10:48:23 INFO hdfs.NNBench:                            Maps to run: 30
22/02/10 10:48:23 INFO hdfs.NNBench:                         Reduces to run: 3
22/02/10 10:48:23 INFO hdfs.NNBench:                     Block Size (bytes): 1
22/02/10 10:48:23 INFO hdfs.NNBench:                         Bytes to write: 0
22/02/10 10:48:23 INFO hdfs.NNBench:                     Bytes per checksum: 1
22/02/10 10:48:23 INFO hdfs.NNBench:                        Number of files: 10000
22/02/10 10:48:23 INFO hdfs.NNBench:                     Replication factor: 3
22/02/10 10:48:23 INFO hdfs.NNBench:             Successful file operations: 0
22/02/10 10:48:23 INFO hdfs.NNBench: 
22/02/10 10:48:23 INFO hdfs.NNBench:         # maps that missed the barrier: 0
22/02/10 10:48:23 INFO hdfs.NNBench:                           # exceptions: 30000
22/02/10 10:48:23 INFO hdfs.NNBench: 
22/02/10 10:48:23 INFO hdfs.NNBench:                TPS: Create/Write/Close: 0
22/02/10 10:48:23 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity
22/02/10 10:48:23 INFO hdfs.NNBench:             Avg Lat (ms): Create/Write: NaN
22/02/10 10:48:23 INFO hdfs.NNBench:                    Avg Lat (ms): Close: NaN
22/02/10 10:48:23 INFO hdfs.NNBench: 
22/02/10 10:48:23 INFO hdfs.NNBench:                  RAW DATA: AL Total #1: 0
22/02/10 10:48:23 INFO hdfs.NNBench:                  RAW DATA: AL Total #2: 0
22/02/10 10:48:23 INFO hdfs.NNBench:               RAW DATA: TPS Total (ms): 237510
22/02/10 10:48:23 INFO hdfs.NNBench:        RAW DATA: Longest Map Time (ms): 8475.0
22/02/10 10:48:23 INFO hdfs.NNBench:                    RAW DATA: Late maps: 0
22/02/10 10:48:23 INFO hdfs.NNBench:              RAW DATA: # of exceptions: 30000

# 100个mapper和50个reducer来创建10000个文件 
sudo -uhdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar nnbench \
-operation create_write \
-maps 100 \
-reduces 50 \
-blockSize 1 \
-bytesToWrite 0 \
-numberOfFiles 10000 \
-replicationFactorPerFile 3 \
-readFileAfterOpen true \
-baseDir /benchmarks/NNBench-`hostname`

22/02/10 10:51:47 INFO hdfs.NNBench: -------------- NNBench -------------- : 
22/02/10 10:51:47 INFO hdfs.NNBench:                                Version: NameNode Benchmark 0.4
22/02/10 10:51:47 INFO hdfs.NNBench:                            Date & time: 2022-02-10 10:51:47,246
22/02/10 10:51:47 INFO hdfs.NNBench: 
22/02/10 10:51:47 INFO hdfs.NNBench:                         Test Operation: create_write
22/02/10 10:51:47 INFO hdfs.NNBench:                             Start time: 2022-02-10 10:51:13,989
22/02/10 10:51:47 INFO hdfs.NNBench:                            Maps to run: 100
22/02/10 10:51:47 INFO hdfs.NNBench:                         Reduces to run: 50
22/02/10 10:51:47 INFO hdfs.NNBench:                     Block Size (bytes): 1
22/02/10 10:51:47 INFO hdfs.NNBench:                         Bytes to write: 0
22/02/10 10:51:47 INFO hdfs.NNBench:                     Bytes per checksum: 1
22/02/10 10:51:47 INFO hdfs.NNBench:                        Number of files: 10000
22/02/10 10:51:47 INFO hdfs.NNBench:                     Replication factor: 3
22/02/10 10:51:47 INFO hdfs.NNBench:             Successful file operations: 0
22/02/10 10:51:47 INFO hdfs.NNBench: 
22/02/10 10:51:47 INFO hdfs.NNBench:         # maps that missed the barrier: 61
22/02/10 10:51:47 INFO hdfs.NNBench:                           # exceptions: 39000
22/02/10 10:51:47 INFO hdfs.NNBench: 
22/02/10 10:51:47 INFO hdfs.NNBench:                TPS: Create/Write/Close: 0
22/02/10 10:51:47 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity
22/02/10 10:51:47 INFO hdfs.NNBench:             Avg Lat (ms): Create/Write: NaN
22/02/10 10:51:47 INFO hdfs.NNBench:                    Avg Lat (ms): Close: NaN
22/02/10 10:51:47 INFO hdfs.NNBench: 
22/02/10 10:51:47 INFO hdfs.NNBench:                  RAW DATA: AL Total #1: 0
22/02/10 10:51:47 INFO hdfs.NNBench:                  RAW DATA: AL Total #2: 0
22/02/10 10:51:47 INFO hdfs.NNBench:               RAW DATA: TPS Total (ms): 319190
22/02/10 10:51:47 INFO hdfs.NNBench:        RAW DATA: Longest Map Time (ms): 1.644461482876E12
22/02/10 10:51:47 INFO hdfs.NNBench:                    RAW DATA: Late maps: 61
22/02/10 10:51:47 INFO hdfs.NNBench:              RAW DATA: # of exceptions: 39000

清除测试数据:

 sudo -uhdfs hadoop fs -rm -r /benchmarks/NNBench-cluster0

mrbench：MapReduce程序测试

mrbench多次重复执行一个小作业，检查在机群上小作业的运行是否可重复以及运行是否高效

#测试运行一个作业10次
sudo -uhdfs hadoop jar \
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \
mrbench \
-numRuns 10

#测试结果：
	Map-Reduce Framework
		Map input records=1
		Map output records=1
		Map output bytes=3
		Map output materialized bytes=37
		Input split bytes=242
		Combine input records=0
		Combine output records=0
		Reduce input groups=1
		Reduce shuffle bytes=37
		Reduce input records=1
		Reduce output records=1
		Spilled Records=2
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=142
		CPU time spent (ms)=2110
		Physical memory (bytes) snapshot=1510125568
		Virtual memory (bytes) snapshot=7833825280
		Total committed heap usage (bytes)=1857552384
		Peak Map Physical memory (bytes)=591335424
		Peak Map Virtual memory (bytes)=2607177728
		Peak Reduce Physical memory (bytes)=329736192
		Peak Reduce Virtual memory (bytes)=2620637184
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=3
	File Output Format Counters 
		Bytes Written=3
DataLines	Maps	Reduces	AvgTime (milliseconds)
1		2	1	17541

#清除数据
sudo -uhdfs hadoop fs -rm -r /benchmarks/MRBench

TeraSort：Mapreduce 排序测试

TeraGen生成随机数：

 #先生成测试数据1G 到 /tmp/examples/terasort-input 目录下
 sudo -uhdfs hadoop jar \
 /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
 teragen 10000000 /tmp/examples/terasort-input

 [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h /tmp/examples/
 953.7 M  2.8 G  /tmp/examples/terasort-input
   
 [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h  /tmp/examples/terasort-input
 0        0      /tmp/examples/terasort-input/_SUCCESS
 476.8 M  1.4 G  /tmp/examples/terasort-input/part-m-00000
 476.8 M  1.4 G  /tmp/examples/terasort-input/part-m-00001

TeraSort排序:

#默认reduce个数1个，自定义为90个。读取/tmp/examples/terasort-intput，将结果输出到目录 /tmp/examples/terasort-output
 sudo -uhdfs hadoop jar \
 /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
 terasort /tmp/examples/terasort-input /tmp/examples/terasort-output

 [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h  /tmp/examples
 953.7 M  2.8 G    /tmp/examples/terasort-input
 953.7 M  953.7 M  /tmp/examples/terasort-output

 [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h  /tmp/examples/terasort-output
 0       0       /tmp/examples/terasort-output/_SUCCESS
 209     2.0 K   /tmp/examples/terasort-output/_partition.lst
 46.9 M  46.9 M  /tmp/examples/terasort-output/part-r-00000
 48.5 M  48.5 M  /tmp/examples/terasort-output/part-r-00001
 47.7 M  47.7 M  /tmp/examples/terasort-output/part-r-00002
 47.2 M  47.2 M  /tmp/examples/terasort-output/part-r-00003
 48.5 M  48.5 M  /tmp/examples/terasort-output/part-r-00004
 47.7 M  47.7 M  /tmp/examples/terasort-output/part-r-00005
 47.9 M  47.9 M  /tmp/examples/terasort-output/part-r-00006
 48.6 M  48.6 M  /tmp/examples/terasort-output/part-r-00007
 47.0 M  47.0 M  /tmp/examples/terasort-output/part-r-00008
 47.2 M  47.2 M  /tmp/examples/terasort-output/part-r-00009
 46.5 M  46.5 M  /tmp/examples/terasort-output/part-r-00010
 47.3 M  47.3 M  /tmp/examples/terasort-output/part-r-00011
 47.8 M  47.8 M  /tmp/examples/terasort-output/part-r-00012
 47.0 M  47.0 M  /tmp/examples/terasort-output/part-r-00013
 48.8 M  48.8 M  /tmp/examples/terasort-output/part-r-00014
 48.2 M  48.2 M  /tmp/examples/terasort-output/part-r-00015
 47.7 M  47.7 M  /tmp/examples/terasort-output/part-r-00016
 47.5 M  47.5 M  /tmp/examples/terasort-output/part-r-00017
 48.4 M  48.4 M  /tmp/examples/terasort-output/part-r-00018
 47.2 M  47.2 M  /tmp/examples/terasort-output/part-r-00019

TeraValidate验证:

#如果检测到问题，将乱序的key输出到目录/tmp/examples/terasort-validate
 sudo -uhdfs hadoop jar \
 /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
 teravalidate  /tmp/examples/terasort-output /tmp/examples/terasort-validate
 
 [root@cluster3 ~]# sudo -uhdfs hadoop fs -du -h  /tmp/examples
 953.7 M  2.8 G    /tmp/examples/terasort-input
 953.7 M  953.7 M  /tmp/examples/terasort-output
 24       72       /tmp/examples/terasort-validate


 [root@cluster2 ~]# sudo -uhdfs hadoop fs -du -h  /tmp/examples/terasort-validate
 0   0   /tmp/examples/terasort-validate/_SUCCESS
 24  72  /tmp/examples/terasort-validate/part-r-00000
 
 [root@cluster2 ~]# sudo -uhdfs hadoop fs -cat /tmp/examples/terasort-validate/part-r-00000
 checksum	4c49607ac53602
 # 验证说明数据是有序的，Mapreduce 排序正常没有问题

清除数据

 sudo -uhdfs hadoop fs -rm -r /tmp/examples