Skip to main content

Hive - Unpivot to avoid multiple joins

·156 words·1 min·
HDFS Hive Optimization Big Data
Table of Contents

This article is about how to Unpivot tables in Hive to avoid multiple joins

Intro
#

In Hive, when a table has multiple similar columns, such as:

product_idimage_name1image_name2image_name15
1a.jpgb.jpgc.jpg
2d.jpge.jpgf.jpg

and we want to join based on all image_names, the naive approach would be to perform 15 joins (in the running example).

Hive does not have an unpivot functionality, but it is possible to use the Hive builtin UDF explode to accomplish this: Query example

select x.product_id, x.image_name from (
  select product_id, map("image_name1", image_name1, ..., "image_name15", image_name15) as image_map
  from table ) x
lateral view explode(image_map) expl as image_nr, image_name

This returns an unpivoted table like below, which allows us to perform a single join:

product_idimage_name
1a.jpg
1b.jpg
1c.jpg
2d.jpg
2e.jpg
2f.jpg

Big thanks to Diogo Franco, for this hint ;) check also is Blog Page diogoalexandrefranco.github.io