企业🤖AI智能体构建引擎,智能编排和调试,一键部署,支持私有化部署方案 广告
自定义函数过程: ```scala 1. 定义函数; 2. 注册函数; SparkSession.udf.register():只在sql()中有效; functions.udf():对DataFrame API均有效; 3. 函数调用; ``` (1)需求:统计用户的喜好个数 (2)输入数据格式`hobbies.txt` ```txt alice jogging,Coding,cooking lina travel,dance ``` (3)输出数据格式 ```txt alice jogging,Coding,cooking 3 lina travel,dance 2 ``` (4)代码 ```scala import org.apache.spark.SparkContext import org.apache.spark.sql.SparkSession object DefineFun { // 1. 创建对应的case class case class Hobbies(name:String, hobbies:String) def main(args: Array[String]): Unit = { val spark:SparkSession = SparkSession.builder() .master("local[4]") .appName(this.getClass.getName) .getOrCreate() val sc:SparkContext = spark.sparkContext import spark.implicits._ // 2. 加载数据创建DataFrame val infoRdd = sc.textFile("file:///E:\\hadoop\\input\\hobbies.txt") val hobbiesDF = infoRdd.map(_.split("\t")).map(p=>Hobbies(p(0), p(1))).toDF() hobbiesDF.show() // +-----+--------------------+ // | name| hobbies| // +-----+--------------------+ // |alice|jogging,Coding,co...| // | lina| travel,dance| // +-----+--------------------+ // 3. 创建视图 hobbiesDF.createOrReplaceTempView("hobbies_view") // 4. 在spark.udf.register中注册,该函数只在spark.sql中生效 // spark.udf.register(funName, 匿名函数) spark.udf.register("hobby_num", (s:String)=>s.split(',').size) // 5. 在spark.sql中调用函数 spark.sql("select name, hobbies, hobby_num(hobbies) as hobby_num from hobbies_view").show() // +-----+--------------------+---------+ // | name| hobbies|hobby_num| // +-----+--------------------+---------+ // |alice|jogging,Coding,co...| 3| // | lina| travel,dance| 2| // +-----+--------------------+---------+ // 6. 或者使用functions.udf方法注册,在DataFrame中生效 import org.apache.spark.sql.functions._ val hobby_num2 = udf((s:String)=>s.split(",").size) hobbiesDF.select($"name", $"hobbies", hobby_num2($"hobbies").as("hobby_num")).show() // +-----+--------------------+---------+ // | name| hobbies|hobby_num| // +-----+--------------------+---------+ // |alice|jogging,Coding,co...| 3| // | lina| travel,dance| 2| // +-----+--------------------+---------+ } } ```