Spark连接MongDB

2017-09-13 20:47:11来源:CSDN作者:lengconglin人点击

分享

本博文使用的环境是Ubuntu16.04+java8+Spark2.2.0+MongDB3.4.7,如果还未配置好相关环境可以参考本人前面所写博客:

Java环境配置:http://blog.csdn.net/lengconglin/article/details/77016911
MongDB安装:http://blog.csdn.net/lengconglin/article/details/77072842
Spark开发环境搭建:http://blog.csdn.net/lengconglin/article/details/77847623

创建项目前确保已经开启了spark和MongDB,
启动spark,在spark/sbin目录下执行 : ./start-all.sh 文件
启动MongDB,在MongDB安装目录的bin目录下执行 ./mongod

使用Intellij IDEA创建Scala项目,在build.sbt文件下添加依赖项,注释的可以根据自己需求添加:

libraryDependencies ++= Seq("org.apache.spark" % "spark-core_2.11" % "2.2.0")libraryDependencies += "org.mongodb.spark" % "mongo-spark-connector_2.11" % "2.2.0"libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"//libraryDependencies += "org.apache.spark" % "spark-mllib_2.11" % "2.2.0"//libraryDependencies +="org.apache.spark"%"spark-graphx_2.11"%"2.2.0"//libraryDependencies +="org.apache.spark"%"spark-streaming_2.11"%"2.2.0"//libraryDependencies +="org.apache.spark"%"spark-streaming-flume_2.11"%"2.2.0"

然后在src->main->scala目录下新建Scala Class,选择Kind为Object,命名为mongoTest .scala:

import com.mongodb.spark._import org.apache.spark.{SparkConf, SparkContext}import org.bson._object mongoTest {  def main(args: Array[String]): Unit = {    //这里把"mongodb://127.0.0.1/forest.lengconglin"替换为自己的地址和数据库集合    val conf = new SparkConf()      .setMaster("local")      .setAppName("Mingdao-Score")      .set("spark.mongodb.input.uri", "mongodb://127.0.0.1/forest.lengconglin?readPreference=primaryPreferred")      .set("spark.mongodb.output.uri", "mongodb://127.0.0.1/forest.lengconglin")    val sc = new SparkContext(conf)    //往MongDB写入数据    val documents = sc.parallelize((1 to 10).map(i => Document.parse(s"{test: $i}")))    MongoSpark.save(documents)    //从MongDB读取数据    val rdd = MongoSpark.load(sc)    rdd.foreach(println)    println("Count:"+rdd.count  }}

编译运行测试,可以得到结果。

最新文章

123

最新摄影

微信扫一扫

第七城市微信公众平台