rramos.github.io

07 Sep, 2017 - About 48 seconds

Hive - Unpivot to Avoid Multiple Joins

Intro

In Hive, when a table has multiple similar columns, such as:

product_id image_name1 image_name2 image_name15
1 a.jpg b.jpg c.jpg
2 d.jpg e.jpg f.jpg

and we want to join based on all image_names, the naive approach would be to perform 15 joins (in the running example).

Hive does not have an unpivot functionality, but it is possible to use the Hive builtin UDF explode to accomplish this:
Query example

select x.product_id, x.image_name from (
select product_id, map("image_name1", image_name1, ..., "image_name15", image_name15) as image_map
from table ) x
lateral view explode(image_map) expl as image_nr, image_name

This returns an unpivoted table like below, which allows us to perform a single join:

product_id image_name
1 a.jpg
1 b.jpg
1 c.jpg
2 d.jpg
2 e.jpg
2 f.jpg

Big thanks to Diogo Franco, for this hint ;) check also is Blog Page diogoalexandrefranco.github.io

OLDER > < NEWER