免费注册
帮助文档(华北一、二)

  • 使用BigDataBench的数据源,从10GB开始,每次增加10GB,直至200GB。生成数据源代码如下:

    yarn-wordcount数据源生成代码

     
    wget 
    http://prof.ict.ac.cn/bdb_uploads/bdb_3_1/packages/BigDataBench_V3.2.1_Hadoop.tar.gz 
    tar zxf BigDataBench_V3.2.1_Hadoop.tar.gz 
    cd BigDataBench_V3.2.1_Hadoop_Hive/MicroBenchmarks sh genData_MicroBenchmarks.sh 200

    依次从数据源取数据,进行wordcount测试,并打印结果,代码如下:

    yarn-wordcount-run.sh

     
    #!/bin/bash 
    source /home/hadoop/.bashrc   
    WORK_DIR=`pwd` 
    echo "WORK_DIR=$WORK_DIR data should be put in $WORK_DIR/data-MicroBenchmarks/in"   
    ${HADOOP_HOME}/bin/hadoop fs -mkdir -p /MicroBenchmarks/in/ 
    ${HADOOP_HOME}/bin/hadoop fs -mv /MicroBenchmarks/in/* ${WORK_DIR}/data-MicroBenchmarks/in/ 
    for i in {1..20} 
    do     
    #move 10GB file from ${WORK_DIR}/data-MicroBenchmarks/in/ to /MicroBenchmarks/in/     
    for f in `${HADOOP_HOME}/bin/hadoop fs -ls ${WORK_DIR}/data-
    MicroBenchmarks/in/ | grep lda_wiki1w | awk '{print $NF}' | head -n 20`     
    do         
    ${HADOOP_HOME}/bin/hadoop fs -mv $f /MicroBenchmarks/in/ 
    Done     
    ${HADOOP_HOME}/bin/hadoop fs -rmr ${WORK_DIR}/data-MicroBenchmarks/out/wordcount     
    echo -n "file count = "     
    ${HADOOP_HOME}/bin/hadoop fs -ls /MicroBenchmarks/in/ | grep lda_wiki1w | wc -l     
    time ${HADOOP_HOME}/bin/hadoop jar  
    ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar  
    wordcount  /MicroBenchmarks/in/ ${WORK_DIR}/data-MicroBenchmarks/out/wordcount 
    done

    对四个集群的测试结果如下:

    表2.1 yarn集群wordcount测试结果:

    耗时(s)

    处理速度(MB/s)

    B2

    B2-x

    D1

    D1-x

    B2

    B2-x

    D1

    D1-x

    10G

    332

    176

    319

    177

    30.84

    58.18

    32.10

    57.85

    20G

    624

    319

    618

    322

    32.82

    64.20

    33.14

    63.60

    30G

    990

    455

    922

    462

    31.03

    67.52

    33.32

    66.49

    40G

    1267

    599

    1210

    613

    32.33

    68.38

    33.85

    66.82

    50G

    1536

    752

    1518

    757

    33.33

    68.09

    33.73

    67.64

    60G

    1834

    903

    1840

    925

    33.50

    68.04

    33.39

    66.42

    70G

    2113

    1041

    2111

    1064

    33.92

    68.86

    33.96

    67.37

    80G

    2504

    1184

    2411

    1215

    32.72

    69.19

    33.98

    67.42

    90G

    2785

    1334

    2746

    1360

    33.09

    69.09

    33.56

    67.76

    100G

    3073

    1476

    3006

    1469

    33.32

    69.38

    34.07

    69.71

    110G

    3290

    1624

    3312

    1625

    34.24

    69.36

    34.01

    69.32

    120G

    3633

    1797

    3619

    1793

    33.82

    68.38

    33.95

    68.53

    130G

    3986

    1910

    3915

    1930

    33.40

    69.70

    34.00

    68.97

    140G

    4200

    2053

    4218

    2090

    34.13

    69.83

    33.99

    68.59

    150G

    4562

    2208

    4517

    2226

    33.67

    69.57

    34.00

    69.00

    160G

    4813

    2356

    4732

    2369

    34.04

    69.54

    33.91

    69.16

    170G

    5146

    2496

    5120

    2521

    33.83

    69.74

    34.00

    69.05

    180G

    5453

    2686

    5413

    2662

    33.80

    68.62

    34.05

    69.24

    190G

    5712

    2795

    5723

    2804

    34.06

    69.61

    34.00

    69.39

    200G

    6147

    2938

    6031

    2953

    33.32

    69.71

    33.96

    69.35

    图2.1 yarn集群wordcount处理耗时

    ● 图2.2 yarn集群wordcount处理速度

    BigDataBench介绍内容与下载链接:

    http://prof.ict.ac.cn/BigDataBench/dowloads/


文档是否已解决您的问题?

  已解决   未解决

如您有其它疑问,您也可以与我们技术专家联系探讨。

联系技术专家