spark核心函数实例-广告点击Top3

案例需求:统计每个省份广告被点击次数的top3

最终希望输出效果:

思考:样本中需要提取出省份,广告,还有统计出每个省份每个广告种类点击的次数。然后聚合出广告出现的次数并做排名截取前三名。

代码实现1:

1
2
3
4
5
6
7
sc.textFile("input/agent.log")
.map(line=>(line.split(" ")(1)+"-"+line.split(" ")(4),1))
.reduceByKey(_+_)
.map(a=>(a._1.split("-")(0),(a._1.split("-")(1),a._2)))
.groupByKey()
.mapValues(data=>data.toList.sortWith((a,b)=>(a._2>b._2)).take(3))
.collect().foreach(println)

DAG Visualization

代码实现2:

1
2
3
4
5
6
sc.textFile("input/agent.log")
.map(line=>((line.split(" ")(1),line.split(" ")(4)),1))
.reduceByKey((a,b)=>(a+b))
.groupBy(_._1._1)
.mapValues(a=>a.toList.sortWith(_._2>_._2).take(3))
.collect().foreach(println)

DAG Visualization

分析实现:

Donate
  • Copyright: Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.

扫一扫,分享到微信

微信分享二维码
  • Copyrights © 2020-2021 ycfn97
  • Visitors: | Views:

请我喝杯咖啡吧~

支付宝
微信