各自特点RDDSparkRDDDataFrameSparkDataFrameDataSetSparkDataSet导入隐式转换//创建SparkSession对象valsession=SparkSession.builder.master("local[*]").appName("RDDto").getOrCreate()//导入隐式转化importsession.implicits._//Sparkcontext对象valsc=session.sparkContextRDD转换为其他RDDvallistRDD:RDD[(String,String,Int)]=sc.makeRDD(List(("1","Bob",12),("2","Bigdataboy",16)))转换为DataFrametoDF(字段名*)//转化为DFvalRDDtoDF:DataFrame=listRDD.toDF("id","name","age")转换为DataSet创建样例类caseclassUser(id:BigInt,name:String,age:Int)转换//把每行数据加上样例类valUserRDD:RDD[User]=listRDD.map{case(id,name,age)=>(User(id,name,age))}//转化为DSvalRDDtoDS:Dataset[User]=UserRDD.toDS()DataFrame转换为其他文件{"id":1,"name":"Bigdataboy","age":"18"}{"id":2,"name":"Bob","age":"16"}{"id":3,"name":"Black","age":"18"}创建DFvaljsonDF:DataFrame=session.read.json("indata/data.json")转换为RDDvaltoRDD:RDD[Row]=jsonDF.rdd转换为DataSet样例类caseclassUser(id:BigInt,name:String,age:Int)转换,在DF基础上加上as[泛型]valjsonToDS:Dataset[User]=jsonDF.as[User]DataSet转换为其他样例类caseclassUser(id:BigInt,name:String,age:String)创建DSvalUserDS:Dataset[User]=List(User(1,"Bob","12"),User(2,"Bigdata","16")).toDS()转换为RDDvalUserRDD:RDD[User]=UserDS.rdd转换为DataFramevalUserDF:DataFrame=UserDS.toDF()
概述SparkSQL的核心数据集,在RDD的基础上映射相应的字段名称,更像二维的数据表。创建SparkSession对象valss:SparkSession=SparkSession.builder.master("local[*]").appName("a").getOrCreate()创建DataFrame有三种创建方式一、通过文件方式支持的文件类型scala>spark.read.csvformatjdbcjsonloadoptionoptionsorcparquetschematabletexttextFile创建scala>valcsvdata=spark.read.csv("file:///root/sparkdata/a.csv")csvdata:org.apache.spark.sql.DataFrame=[_c0:string,_c1:string...1morefield]scala>csvdata.show()+---+-------+---+|_c0|_c1|_c2|+---+-------+---+|1|Bob|12||2|Black|12||3|Bigdata|13|+---+-------+---+二、RDD转换注意:如果需要RDD与DF或者DS之间操作,那么都需要引入隐式转换,特别注意引入的位置//构建SparkSQL会话valss:SparkSession=SparkSession.builder.master("local[*]").appName("a").getOrCreate()//注意ss是SparkSession的对象importss.implicits._在SparkSession创建Sparkcontext对象并创建RDD//创建SparkContext对象,创建RDDvalsc:SparkContext=ss.sparkContextvallistRDD:RDD[(Int,String)]=sc.makeRDD(List((1,"Bob"),(2,"Bigdata"),(3,"Black")))转换toDF(colNames:String*)参数是映射的字段名称vallistDF:DataFrame=listRDD.toDF("id","name")listDF.show()-------------+---+-------+|id|name|+---+-------+|1|Bob||2|Bigdata||3|Black|+---+-------+