各自特点RDDSparkRDDDataFrameSparkDataFrameDataSetSparkDataSet导入隐式转换//创建SparkSession对象valsession=SparkSession.builder.master("local[*]").appName("RDDto").getOrCreate()//导入隐式转化importsession.implicits._//Sparkcontext对象valsc=session.sparkContextRDD转换为其他RDDvallistRDD:RDD[(String,String,Int)]=sc.makeRDD(List(("1","Bob",12),("2","Bigdataboy",16)))转换为DataFrametoDF(字段名*)//转化为DFvalRDDtoDF:DataFrame=listRDD.toDF("id","name","age")转换为DataSet创建样例类caseclassUser(id:BigInt,name:String,age:Int)转换//把每行数据加上样例类valUserRDD:RDD[User]=listRDD.map{case(id,name,age)=>(User(id,name,age))}//转化为DSvalRDDtoDS:Dataset[User]=UserRDD.toDS()DataFrame转换为其他文件{"id":1,"name":"Bigdataboy","age":"18"}{"id":2,"name":"Bob","age":"16"}{"id":3,"name":"Black","age":"18"}创建DFvaljsonDF:DataFrame=session.read.json("indata/data.json")转换为RDDvaltoRDD:RDD[Row]=jsonDF.rdd转换为DataSet样例类caseclassUser(id:BigInt,name:String,age:Int)转换,在DF基础上加上as[泛型]valjsonToDS:Dataset[User]=jsonDF.as[User]DataSet转换为其他样例类caseclassUser(id:BigInt,name:String,age:String)创建DSvalUserDS:Dataset[User]=List(User(1,"Bob","12"),User(2,"Bigdata","16")).toDS()转换为RDDvalUserRDD:RDD[User]=UserDS.rdd转换为DataFramevalUserDF:DataFrame=UserDS.toDF()
概述Spark最新的数据集,在DataFrame的基础上,通过样例类来映射数据的结构信息,在字段名称上多加了字段类型等。是强类型的数据集合。创建DataSet一般不会使用直接创建,都是通过RDD或者DataFrame转换过去。直接创建创建样例类caseclassUser(id:BigInt,name:String,age:String)创建DataFrame通过样例类进行映射转换//创建DSvalDS:Dataset[User]=List(User(1,"Mary","13")).toDS()DS.show()---------+---+----+---+|id|name|age|+---+----+---+|1|Mary|13|+---+----+---+通过DataFrame转换创建样例类注意类型的精度范围caseclassUser(id:BigInt,name:String,age:String)转换valjsonDF:DataFrame=session.read.json("indata/data.json")//DF创建DS,首先需要样例类valDFtoDS:Dataset[User]=jsonDF.as[User]DFtoDS.show()-------------+---+---+----------+|age|id|name|+---+---+----------+|18|1|Bigdataboy||16|2|Bob||18|3|Black|+---+---+----------+