Hive UDFs

There are two ways we can write an UDF for Hive

1) Simple UDF
2) GenericUDF

We can use simple UDF when your UDF accepts simple Primitive types of arguments like Text, IntWritable, LongWritable, DoubleWritable, etc.

If you want to have UDF that accepts array,List,Set,Map etc you need to write Generic UDF

1) Simple UDF

Simple UDF extends UDF

For Simple UDF we need import statement

import org.apache.hadoop.hive.ql.exec.UDF;

Also We require to add hive-exec-*.jar JAR from $HIVE_HOME/lib along with JARS requirede for running Hadoop Program from $HADOOP_HOME/share/hadoop/common and $HADOOP_HOME/share/hadoop/mapreduce
Following is Sample UDF which accepts Sring value from Hive and returns String Value to Hive

package com.hive.udfs.common;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public class SampleUDF extends UDF
{

public Text evaluate(Text input) {
if(input == null) return null;
return new Text(“Hello ” + input.toString());
}
}

As you can see evaluate function contains ahe actual logic. This function is a must In UDF

To use this UDF in hive we first need to add this jar to our classpath as below :

Start Hive and type in below command :

add jar /home/user/Data/PROJECTS/Assignment/udf/TestUDF.jar;

this is path where JAR file is present.

After adding it to classpath we need to create a temporary function pointing to the class in which we have written our UDF. In our case it is SampleUDF.

create temporary function test_udf as ‘com.hive.udfs.common.TestUDF’;

Now last step is to access this UDF from Hive

hive> select test_udf(empname) from Employee;

And We are Done!!

2) Generic UDF
———————————————-
GenericUDF extends GenericUDF

For Generic UDF we need import statement

import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
Also We require to add hive-serde-*.jar JAR from $HIVE_HOME/lib long with JARS requirede for running Hadoop Program from $HADOOP_HOME/share/hadoop/common and $HADOOP_HOME/share/hadoop/mapreduce

Leave a comment