免费注册
帮助文档(华北一、二)

  • 使用BigDataBench的数据源,从10GB开始,每次增加10GB,直至200GB。生成数据源代码如下:

    yarn-wordcount数据源生成代码

     
    wget 
    http://prof.ict.ac.cn/bdb_uploads/bdb_3_1/packages/BigDataBench_V3.2.1_Hadoop.tar.gz 
    tar zxf BigDataBench_V3.2.1_Hadoop.tar.gz 
    cd BigDataBench_V3.2.1_Hadoop_Hive/MicroBenchmarks sh genData_MicroBenchmarks.sh 200

    依次从数据源取数据,进行wordcount测试,并打印结果,代码如下:

    yarn-wordcount-run.sh

     
    #!/bin/bash 
    source /home/hadoop/.bashrc   
    WORK_DIR=`pwd` 
    echo "WORK_DIR=$WORK_DIR data should be put in $WORK_DIR/data-MicroBenchmarks/in"   
    ${HADOOP_HOME}/bin/hadoop fs -mkdir -p /MicroBenchmarks/in/ 
    ${HADOOP_HOME}/bin/hadoop fs -mv /MicroBenchmarks/in/* ${WORK_DIR}/data-MicroBenchmarks/in/ 
    for i in {1..20} 
    do     
    #move 10GB file from ${WORK_DIR}/data-MicroBenchmarks/in/ to /MicroBenchmarks/in/     
    for f in `${HADOOP_HOME}/bin/hadoop fs -ls ${WORK_DIR}/data-
    MicroBenchmarks/in/ | grep lda_wiki1w | awk '{print $NF}' | head -n 20`     
    do         
    ${HADOOP_HOME}/bin/hadoop fs -mv $f /MicroBenchmarks/in/ 
    Done     
    ${HADOOP_HOME}/bin/hadoop fs -rmr ${WORK_DIR}/data-MicroBenchmarks/out/wordcount     
    echo -n "file count = "     
    ${HADOOP_HOME}/bin/hadoop fs -ls /MicroBenchmarks/in/ | grep lda_wiki1w | wc -l     
    time ${HADOOP_HOME}/bin/hadoop jar  
    ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar  
    wordcount  /MicroBenchmarks/in/ ${WORK_DIR}/data-MicroBenchmarks/out/wordcount 
    done

    对四个集群的测试结果如下:

    表2.1 yarn集群wordcount测试结果:

    耗时(s)

    处理速度(MB/s)

    B2

    B2-x

    D1

    D1-x

    B2

    B2-x

    D1

    D1-x

    10G

    332

    176

    319

    177

    30.84

    58.18

    32.10

    57.85

    20G

    624