- 计算
- 网络
- 存储与CDN
-
数据库
-
云数据库 RDS MySQL
- 产品概述
- 产品定价
- 快速入门
- 操作手册
- 案例实践
- API文档
-
常见问题
- 如何访问MySQL实例?
- MySQL实例的安全性如何?
- 如何向MySQL实例中导入数据?
- 如何向MySQL实例中导出数据?
- 如何创建新用户并授予权限?
- QPS是如何统计的?
- 什么是内存溢出?
- 默认的最大连接数是多少?
- 如何查看数据库运行状态?
- 如何查看MySQL实例的SlowLog?
- 如何修改MySQL实例的配置参数?
- 如何安装和卸载插件?
- 如何使用MySQL-Proxy使MySQL实例可以通过外网访问?
- 何查看MySQL实例的各项监控指标?
- 是否可以查看云数据库运行状态?
- 默认的配置是针对哪种存储引擎优化的?
- 如何在云主机上搭建云数据库从库并进行主从同步呢?
- 如何正确设置字符集?
- 如何查询MySQL实例的客户端和服务器端版本
- 相关协议
- 云数据库 RDS PostgreSQL
- 云数据库 Redis
- 云数据库 MongoDB
- 分布式数据库 InDDB
- 云数据库 Memcache
-
云数据库 RDS MySQL
- 安全
- 人工智能
-
大数据
- ES服务 Elasticsearch
- 数据仓库 DW
- 托管Hadoop
- 管理和监控
-
API
-
对象存储OSS
- 创建Bucket-CreateBucket
- 获取Bucket信息-DescribeBucket
- 更改Bucket属性-UpdateBucket
- 删除Bucket-DeleteBucket
- 前缀列表查询 – PrefixFileList
- 上传文件 – PutFile
- 表单上传 – PostFile
- 秒传文件-UploadHit
- 下载文件-GetFile
- 查询文件基本信息-HEADFile
- 删除文件 – DeleteFile
- 初始化分片 – InitiateMultipartUpload
- 上传分片 – UploadPart
- 完成分片 – FinishMultipartUpload
- 放弃分片 – AbortMultipartUpload
- 查看配额状态-GetUFileQuota
- 查询配额支付价格-GetUFileQuotaPrice
- 查看配额使用报表-GetUFileReport
- 获取配额信息-GetUFileQuotaInfo
- 获取已上传成功的分片列表-GetMultiUploadPart
- 更新令牌-UpdateUFileToken
- 删除令牌-DeleteUFileToken
- 获取令牌信息-DescribeUFileToken
- OSS 错误码列表
- 操作文件的Meta信息 – OpMeta
- API文档综述
-
弹性公网IP EIP
- 1、申请弹性IP-AllocateEIP
- 2、获取弹性IP信息-DescribeEIP
- 3、更新弹性IP属性-UpdateEIPAttribute
- 4、释放弹性IP-ReleaseEIP
- 5、绑定弹性IP-BindEIP
- 6、解绑弹性IP-UnBindEIP
- 7、调整弹性IP带宽-ModifyEIPBandwidth
- 8. 修改弹性IP出口权重-ModifyEIPWeight
- 9. 获取弹性IP价格-GetEIPPrice
- 10. 获取弹性IP带宽改动价格-GetEIPUpgradePrice
- 11. 获取弹性IP计费方式-GetEIPPayMode
- 12. 设置弹性IP计费方式-SetEIPPayMode
- 13. 申请内网虚拟IP-AllocateVIP
- 14. 获取内网虚拟IP信息-DescribeVIP
- 15. 释放内网虚拟IP- ReleaseVIP
- 16. 创建带宽包-CreateBandwidthPackage
- 17. 获取带宽包信息-DescribeBandwidthPackage
- 18. 删除带宽包-DeleteBandwidthPackage
- 19. 开通共享带宽-AllocateShareBandwidth
- 20. 获取共享带宽信息-DescribeShareBandwidth
- 21. 调整共享带宽-ResizeShareBandwidth
- 22. 关闭共享带宽-ReleaseShareBandwidth
- 23. 将EIP加入共享带宽-AssociateEIPWithShareBandwidth
- 24. 将EIP移出共享带宽-DisassociateEIPWithShareBandwidth
- 25. 获取带宽用量-DescribeBandwidthUsage
- 26. 更新防火墙属性-UpdateFirewallAttribute
- 27. 获取防火墙信息-DescribeFirewall
- 28. 应用防火墙-GrantFirewall
- 29. 错误码
-
云服务器ECS
- 1、获取VNC登录信息-GetUHostInstanceVncInfo
- 2、启动云服务器-StartUHostInstance
- 3、重启云服务器-RebootUHostInstance
- 4、关闭云服务器-StopUHostInstance
- 5、获取云服务器业务组列表-DescribeUHostTags
- 6、字段规范
- 7、删除云服务器-TerminateUHostInstance
- 8、重置云服务器密码-ResetUHostInstancePassword
- 9、修改云服务器业务组-ModifyUHostInstanceTag
- 10、修改云服务器名-ModifyUHostInstanceName
- 11、获取挂载磁盘的升级价格-GetAttachedDiskUpgradePrice
- 12、修改云服务器配置-ResizeUHostInstance
- 13、获取升级配置价格-GetUHostUpgradePrice
- 14、创建云服务器-CreateUHostInstance
- 15、移除硬件隔离组-LeaveIsolationGroup
- 16、创建硬件隔离组-CreateIsolationGroup
- 17、删除自制镜像-TerminateCustomImage
- 18、创建自制镜像-CreateCustomImage
- 19、导入镜像-ImportCustomImage
- 20、修改云服务器备注-ModifyUHostInstanceRemark
- 21、修改挂载的磁盘大小-ResizeAttachedDisk
- 22、模拟服务器掉电-PoweroffUHostInstance
- 23、重装系统-ReinstallUHostInstance
- 24、获取镜像列表-DescribeImage
- 25、获取云服务器价格-GetUHostInstancePrice
- 26、获取云服务器信息-DescribeUHostInstance
- 27、普通机型开启CDP-UpgradeToArkUHostInstance
-
对象存储OSS
- 用户提醒
- 服务等级协议(SLA)
- 企业上云常见问题
- 其他协议
- 云市场
- 开发者
- 账户管理
-
3.1 JAVA - WordCount示例
在examples中spark提供了这样几个java例子:
仅以wordcount为例展示运行方法:
spark-submit --class org.apache.spark.examples.JavaWordCount --master yarn--deploy-mode cluster /home/hadoop/spark/lib/spark-examples*.jar/input/kv.txt /output/wordcount注解:
1、在hdfs上面准备好/input/kv.txt数据,随便一个文本什么的都行。
2、/input/kv.txt /output/wordcount为运行参数,如果是其他程序,需要替换为对应的参数。
3、这里没有指定分配的资源,在用户实际测试过程中,需要根据测试时使用的运算量,合理的设定资源。
利用eclipse开发过程spark java程序过程:
● 创建Java Project
● 创建lib目录把工程依赖包spark-assembly-1.4.1-hadoop2.6.0-cdh5.4.4.jar加入lib包内,并add to build path
● 完成功能代码
代码内容如下:
package test.java.spark.job;import java.util.Arrays;import java.util.List;import java.util.regex.Pattern;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaPairRDD;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.api.java.function.FlatMapFunction;import org.apache.spark.api.java.function.Function2;import org.apache.spark.api.java.function.PairFunction;import scala.Tuple2;public final class JavaWordCount {private static final Pattern SPACE = Pattern.compile(" ");public static void main(String[] args) throws Exception {if (args.length < 2) {System.err.println("Usage: JavaWordCount <input> <output>");System.exit(1);}SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount");JavaSparkContext ctx = new JavaSparkContext(sparkConf);JavaRDD<String> lines = ctx.textFile(args[0], 1);JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {@Overridepublic Iterable<String> call(String s) {return Arrays.asList(SPACE.split(s));}});JavaPairRDD<String, Integer> ones = words.mapToPair(newPairFunction<String, String, Integer>() {@Overridepublic Tuple2<String, Integer> call(String s) {return new Tuple2<String, Integer>(s, 1);}});JavaPairRDD<String, Integer> counts = ones.reduceByKey(newFunction2<Integer, Integer, Integer>() {@Overridepublic Integer call(Integer i1, Integer i2) {return i1 + i2;}});List<Tuple2<String, Integer>> output = counts.collect();counts.saveAsTextFile(args[1]);for (Tuple2<?,?> tuple : output) {System.out.println(tuple._1() + ": " + tuple._2());}ctx.stop();}}● 导出工程文件
● 数据准备
将一段文本上传到 hdfs的这个位置 /input/kv1.txt
hdfs dfs –put kv1.txt /input/kv1.txt 提交任务运行
spark-submit --class test.java.spark.JavaWordCount --master yarn --deploy-mode client --num-executors 1 --driver-memory 1g --executor-memory 1g --executor-cores 1 /home/hadoop/sparkJavaExample.jar /input/kv1.txt/output/wordcount注解:
资源配置不要超过当前机器的配额。
3.2 Scala - HiveFromSpark示例
● 安装sbt
curl https://bintray.com/sbt/rpm/rpm > bintray-sbt-rpm.reposudo mv bintray-sbt-rpm.repo /etc/yum.repos.d/sudo yum install sbt -y● 构建代码
以Spark example的HiveFromSpark为例:
mkdir -p /data/HiveFromSpark/src/main/scala/com/ucloud/spark/examplescd /data/HiveFromSpark/src/main/scala/com/ucloud/spark/examplestouch HiveFromSpark.scala;以Spark1.6.0、scala-2.10.5为例,可将下述代码添加进HiveFromSpark.scala。
package com.ucloud.spark.examplesimport com.google.common.io.{ByteStreams, Files}import java.io.Fileimport org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.sql._import org.apache.spark.sql.hive.HiveContextobject HiveFromSpark {case class Record(key: Int, value: String)def main(args: Array[String]) {val sparkConf = new SparkConf().setAppName("HiveFromSpark")val sc = new SparkContext(sparkConf)val hiveContext = new HiveContext(sc)import hiveContext.implicits._import hiveContext.sql// Queries are expressed in HiveQLprintln("Result of 'SELECT *': ")sql("SELECT * FROM src").collect().foreach(println)// Aggregation queries are also supported.val count = sql("SELECT COUNT(*) FROM src").collect().head.getLong(0)println(s"COUNT(*): $count")// The results of SQL queries are themselves RDDs and support all normal RDD functions. The// items in the RDD are of type Row, which allows you to access each column by ordinal.val rddFromSql = sql("SELECT key, value FROM src WHERE key < 10 ORDER BY key")println("Result of RDD.map:")val rddAsStrings = rddFromSql.map {case Row(key: Int, value: String) => s"Key: $key, Value: $value"}// You can also register RDDs as temporary tables within a HiveContext.val rdd = sc.parallelize((1 to 100).map(i => Record(i, s"val_$i")))// You can also register RDDs as temporary tables within a HiveContext.val rdd = sc.parallelize((1 to 100).map(i => Record(i, s"val_$i")))rdd.toDF().registerTempTable("records")// Queries can then join RDD data with data stored in Hive.println("Result of SELECT *:")sql("SELECT * FROM records r JOIN src s ON r.key = s.key").collect().foreach(println)sc.stop()}}● 构建sbt文件
cd /data/HiveFromSpark/touch HiveFromSpark.sbt;● 将下述内容添加到文件:
name := "HiveFromSpark"version := "1.0"scalaVersion := "2.10.5"libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.0"libraryDependencies += "org.apache.spark" % "spark-hive_2.10" % "1.6.0"● 编译
cd /data/HiveFromSpark/sbt package由于需要连接maven下载相关依赖包,编译时间受网络环境限制,请耐心等待, 编译后文件位于 /data/HiveFromSpark/target/scala-2.10/hivefromspark_2.10-1.0.jar。
● 执行
client模式:
spark-submit --class com.ucloud.spark.examples.HiveFromSpark --master yarn--deploy-mode client --num-executors 4 --executor-cores 1/data/HiveFromSpark/target/scala-2.10/hivefromspark_2.10-1.0.jarcluster模式:
spark-submit --class com.ucloud.spark.examples.HiveFromSpark --master yarn--deploy-mode cluster --num-executors 4 --executor-cores 1 --files/home/hadoop/hive/conf/hive-site.xml --jars/home/hadoop/spark/lib/datanucleus-api-jdo-3.2.6.jar,/home/hadoop/spark/lib/datanucleus-rdbms-3.2.9.jar,/home/hadoop/spark/lib/datanucleus-core-3.2.10.jar/data/HiveFromSpark/target/scala-2.10/hivefromspark_2.10-1.0.jar3.3 Python - PI示例
注解:请参考spark安装目录下examples/src/main/python目录下的实例程序。
示例代码:
from __future__ import print_functionimport sysfrom random import randomfrom operator import addfrom pyspark.sql import SparkSessionif __name__ == "__main__":"""Usage: pi [partitions]"""spark = SparkSession\.builder\.appName("PythonPi")\.getOrCreate()partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2n = 100000 * partitionsdef f(_):x = random() * 2 - 1y = random() * 2 - 1return 1 if x ** 2 + y ** 2 < 1 else 0count = spark.sparkContext.parallelize(range(1, n + 1),partitions).map(f).reduce(add)print("Pi is roughly %f" % (4.0 * count / n))spark.stop()可通过以下命令提交Spark PI任务
spark-submit --master yarn --deploy-mode client --num-executors 4 --executor-cores 1 --executor-memory 2G$SPARK_HOME/examples/src/main/python/pi.py 100最终在console的日志中会出现类似“Pi is roughly 3.141039”的结果。