Today i was going to use a simple sha256 funtion in Hive in order to mask a colunm and aparently in the latest Cloudera distribution the Shipped hive version doesn’t have that native function.
This article will explain how you can build a sha256 or other udfs function and add it in Hive.
Checking Cloudera Packages Version
Check the following URL in order to see the latest shipped package versions in Cloudera.
CDH 5.12 -> hive-1.1.0+cdh5.12.1+1197
Return Type
Name(Signature)
Description
string
sha2(string/binary, int)
Calculates the SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512) (as of Hive 1.3.0). The first argument is the string or binary to be hashed. The second argument indicates the desired bit length of the result, which must have a value of 224, 256, 384, 512, or 0 (which is equivalent to 256). SHA-224 is supported starting from Java 8. If either argument is NULL or the hash length is not one of the permitted values, the return value is NULL. Example: sha2(‘ABC’, 256) = ‘b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78’.
this will server more as an exercise, one could create a more complex udf funtion. For the time being let’s create a GenericUDFSha2 based on existing hive 1.3.0 version
/** * GenericUDFSha2. * */ @Description(name = "sha2", value = "_FUNC_(string/binary, len) - Calculates the SHA-2 family of hash functions " + "(SHA-224, SHA-256, SHA-384, and SHA-512).", extended = "The first argument is the string or binary to be hashed. " + "The second argument indicates the desired bit length of the result, " + "which must have a value of 224, 256, 384, 512, or 0 (which is equivalent to 256). " + "SHA-224 is supported starting from Java 8. " + "If either argument is NULL or the hash length is not one of the permitted values, the return value is NULL.\n" + "Example: > SELECT _FUNC_('ABC', 256);\n 'b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78'") publicclassGenericUDFSha2extendsGenericUDF { privatetransient Converter[] converters = newConverter[2]; privatetransient PrimitiveCategory[] inputTypes = newPrimitiveCategory[2]; privatefinalTextoutput=newText(); privatetransientboolean isStr; privatetransient MessageDigest digest;
@Override public ObjectInspector initialize(ObjectInspector[] arguments)throws UDFArgumentException { checkArgsSize(arguments, 2, 2);
// the function should support both string and binary input types checkArgGroups(arguments, 0, inputTypes, STRING_GROUP, BINARY_GROUP); checkArgGroups(arguments, 1, inputTypes, NUMERIC_GROUP);
Next on your Hive session you need to ADD JAR and create a FUNCTION or TEMPORARY FUNCTION
ADD JAR ./target/GenericUDFSha2-1.0-SNAPSHOT.jar CREATE TEMPORARY FUNCTION sha2 AS 'com.rramos.bigdata.utils.GenericUDFSha2'; SELECT sha2(foo) from bar LIMIT1;
Matthew Rathbone Blog has some great tutorial on Hive Funtions. Take a look if you want to go deep with it.