Using a Custom Groovy Script Hive UDF
To run each Hive UDF job, you can either run these steps individually on your Hive client within a single session, or create an HQL file compiling all the required steps sequentially and run it in one go.
- In your Hive client, log in to the required Hive database.
-
Register the JAR file of Spectrum™ Data & Address Quality for Big Data SDK DIM Module.
ADD JAR /home/hduser/script/dim.hive.${project.version}.jar;
-
Create an alias for the Hive UDF of the
CustomGroovyScript
job.Note: String in quotes represents the class names needed for this job to run.For example,CREATE TEMPORARY FUNCTION customscript as 'com.pb.bdq.dim.process.hive.script.groovy.CustomGroovyScriptExecutionUDF';
-
Enable or disable the hive fetch task conversion.
For Example,
set hive.fetch.task.conversion=none;
-
Use
hivevar:defaultConfiguration
to specify the date, date-time, and time pattern. Assign this configuration to the respective variable.set hivevar:defaultConfiguration='{"datePattern":"M/d/yy", "dateTimePattern":"M/d/yy h:mm a","timePattern":"h:mm a"}';
Note: This is an optional configuration. -
Specify the header fields of the input table in comma-separated format, and
assign to a variable.
For example,
set hivevar:header='busniessname,recordid';
-
Use
hivevar:scriptConfigurations
to set the groovy script configurations. It includes details, such as groovyScriptFile, inputFields, and outputFieldsFor Example,set hivevar:scriptConfigurations = '[{"groovyScriptFile":"/home/hduser/script/groovy_hive.txt", "inputFields":[{"name":"busniessname","type":"string"}, {"name":"recordid","type":"integer"}], "outputFields":[{"name":"outtan","type":"double"}]}, {"groovyScriptFile":"/home/hduser/script/groovy2.txt", "inputFields":[], "outputFields":[{"name":"outtan2","type":"double"}]}]';
-
To run the job and display the job output on the
console, write the query as indicated in this example:
Note: This query returns output fields as transformed by the groovy script.
To run the job and dump the output in a designated file, write the query as indicated in this example:SELECT customscript(${hivevar:scriptConfigurations},"", ${hivevar:header}, InputKeyValue, AddressLine1) FROM groovy_tc1;
INSERT OVERWRITE LOCAL DIRECTORY '/home/hduser/script/output' row format delimited FIELDS TERMINATED BY ',' lines TERMINATED BY '\n' STORED AS TEXTFILE SELECT * FROM (SELECT customscript (${hivevar:scriptConfigurations}, ${hivevar:defaultConfiguration},${hivevar:header}, InputKeyValue, AddressLine1) as mygp FROM groovy_tc1 ) record; !q;
Note: Use the alias defined earlier for the UDF.