免费注册
帮助文档(华北一、二)

  • 以下步骤以flume-1.6.0为例。

    限制说明

    ● 只支持orc存储格式的hive表

    ● 支持带有buckets的表

     
    CREATE EXTERNAL TABLE stocks (     
      date STRING,     
      open DOUBLE,     
      high DOUBLE,     
      low DOUBLE,     
      close DOUBLE,     
      volume BIGINT,     
      adj_close DOUBLE) 
    PARTITIONED BY(year STRING) 
    CLUSTERED BY (date) into 3 buckets 
    STORED AS ORC;

    ● partition是可选项

    ● 数据源只支持csv和json两种格式

    ● hivesink使兼容的版本是hive1.0.0

    ● metastore 增加下面的配置,然后重启metastore

     
    hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager 
    hive.compactor.initiator.on = true 
    hive.compactor.worker.threads = 5

    Flume配置

    ● flume.conf

     
    a1.sources = src1 
    a1.channels = chan1 
    a1.sinks = sink1 
    a1.sources.src1.type = spooldir 
    a1.sources.src1.channels = chan1 
    a1.sources.src1.spoolDir = /root/stk 
    a1.sources.src1.interceptors = skipHeadI 
    dateI a1.sources.src1.interceptors.skipHeadI.type = regex_filter 
    a1.sources.src1.interceptors.skipHeadI.regex = ^Date.* 
    a1.sources.src1.interceptors.skipHeadI.excludeEvents = true 
    a1.sources.src1.interceptors.dateI.type = regex_extractor 
    a1.sources.src1.interceptors.dateI.regex = ^(\\d+)-.* 
    a1.sources.src1.interceptors.dateI.serializers = y 
    a1.sources.src1.interceptors.dateI.serializers.y.name = year 
    a1.channels.chan1.type = memory 
    a1.channels.chan1.capacity = 1000 
    a1.channels.chan1.transactionCapacity = 100 
    a1.sinks.sink1.type = hive