协同过滤常被应用于推荐系统,这些技术旨在补充用户-商品关联矩阵中所缺失的部分。
<br/>
MLlib 当前支持基于模型的协同过滤,其中用户和商品通过一小组隐语义因子进行表达,并且这些因子也用于预测缺失的元素。
<br/>
**1. 协同过滤**
(1)数据`$SPARK_HOME/data/mllib/als/test.data`
```txt
用户id,商品id,评分
1,1,5.0
1,2,1.0
1,3,5.0
1,4,1.0
2,1,5.0
```
(2)代码
```scala
import org.apache.spark.mllib.recommendation.{ALS, MatrixFactorizationModel, Rating}
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object ALSAlgorithm {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setMaster("local[*]")
.setAppName(this.getClass.getName)
val sc: SparkContext = SparkContext.getOrCreate(conf)
val data: RDD[String] = sc.textFile("F:/mllib/test.data")
val ratings: RDD[Rating] = data.map(_.split(",") match {
case Array(user, item, rate) => Rating(user.toInt, item.toInt, rate.toDouble)
})
// 迭代次数
val numIterations = 20
// 训练模型
val model: MatrixFactorizationModel = ALS.train(ratings, 1, numIterations, 0.01)
// 可以将模型保存
// model.save(sc , "F:/mllib/model/als")
// 加载模型
// val model:MatrixFactorizationModel = MatrixFactorizationModel.load(sc, "F:/mllib/model/als")
val usersProducts = ratings.map { case Rating(user, product, rate) => Tuple2(user, product) }
// 模型预测
val predictions: RDD[Tuple2[Tuple2[Int, Int], Double]] = model.predict(usersProducts).map {
case Rating(user, product, rate) => ((user, product), rate)
}
val ratesAndPreds: RDD[Tuple2[Tuple2[Int, Int], Tuple2[Double, Double]]] = ratings.map {
case Rating(user, product, rate) => ((user, product), rate)
}.join(predictions)
// 通过计算预测出的评分的均方差来评估这个推荐模型
val MSE = ratesAndPreds.map {
case ((user, product), (r1, r2)) => math.pow(r1 - r2, 2)
}.reduce(_ + _) / ratesAndPreds.count()
println(MSE) // 4.000268960412628
}
}
```