07 Sep, 2017 - About 48 seconds
Hive - Unpivot to Avoid Multiple Joins
Intro
In Hive, when a table has multiple similar columns, such as:
product_id | image_name1 | image_name2 | … | image_name15 |
---|---|---|---|---|
1 | a.jpg | b.jpg | … | c.jpg |
2 | d.jpg | e.jpg | … | f.jpg |
and we want to join based on all image_names, the naive approach would be to perform 15 joins (in the running example).
Hive does not have an unpivot functionality, but it is possible to use the Hive builtin UDF explode to accomplish this:
Query example
select x.product_id, x.image_name from ( |
This returns an unpivoted table like below, which allows us to perform a single join:
product_id | image_name |
---|---|
1 | a.jpg |
1 | b.jpg |
1 | c.jpg |
2 | d.jpg |
2 | e.jpg |
2 | f.jpg |
Big thanks to Diogo Franco, for this hint ;) check also is Blog Page diogoalexandrefranco.github.io