lateral view explode

提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档


前言

hive sql中,UDTF函数可以将一行数据转换为多行,在查询时,如果与其他列同时出现,会报如下所示错误信息。 org.apache.hadoop.hive.ql.parse.SemanticException:UDTF’s are not supported outside the SELECT clause, nor nested in expressions

一、介绍

explode函数:将一行的数据展开成多行(一进多出)

lateral view关键字:将展开后的数据与原表字段进行横向连接

二、使用步骤

数据

  val data = Seq(
      ("电影1", "演员1,演员2"),
      ("电影2", "演员3,演员4")
    )
val df = data.toDF("movie_name", "actors")

val resultDF = spark.sql(
  """
    |select movie_name, el_actors as actor
    |from movies lateral view explode(split(`actors`, ',')) t as el_actors
  """.stripMargin)

// TODO 结果
+--------+-----+
|movie_name| actor|
+--------+-----+
|   电影1| 演员1|
|   电影1| 演员2|
|   电影2| 演员3|
|   电影2| 演员4|
+--------+-----+
split 函数将演员字段按逗号进行分割,配合explode,lateral view 很好用

2.一些例子

// 统计每个演员的评分
val actorAverageScoreData = spark.sql(
  """
    |select actor as name, round(avg(cast(rating as double)), 1) as value
    |from(
    |     select element as actor, rating
    |     from movies lateral view explode(split(`actors`, ', ')) t as element
    |)subquery
    |group by actor order by value desc
    |""".stripMargin)
// 统计每个演员出现的次数     
val averageData = spark.sql(
  """
    |select actor as name, count(*) as value
    |from(
    |     select element as actor
    |     from movies lateral view explode(split(`actors`, ', ')) t as element
    |)subquery
    |group by actor order by value desc
    |""".stripMargin)

三、总结

感觉会用到很多,所以记录一下

大数据

BeanCopyUtils.copyList(dataList, esClass)报错

2025-3-1 23:12:03

大数据

ai技术如何改变医疗行业发展趋势(ai在医疗领域的发展前景)

2025-3-1 23:12:05

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧