Using a Hive UDF of Data Normalization Module

To run each Hive UDF job, you can either run these steps individually on your Hive client within a single session, or create an HQL file compiling all the required steps sequentially and run it in one go.

In your Hive client, log in to the required Hive database.
Register the JAR file of Spectrum™ Data & Address Quality for Big Data SDK DNM Module.
```
ADD JAR <Directory path>/dnm.hive.${project.version}.jar;
```
Create an alias for the Hive UDF of the Data Quality job you wish to run.

Note: String in quotes represents the class names needed for this job to run.
For example:
```
CREATE TEMPORARY FUNCTION tablelookup as 'com.pb.bdq.dnm.process.hive.tablelookup.TableLookUpUDF';
```
Specify the reference data path.
- Reference data is on HDFS
  - Reference data is to be downloaded to a working directory for jobs
    - If the reference data is in unarchived file format, set the reference directory as:
```
set hivevar:refereceDataDetails='{"referenceDataPathLocation":"HDFS",
"dataDir":"./referenceData","dataDownloader":{"dataDownloader":"DC"}}';
```
    - If the reference data is in archived format, set the reference directory as:
```
set hivevar:refereceDataDetails='{"referenceDataPathLocation":"HDFS",
"dataDir":"./referenceData.zip","dataDownloader":
{"dataDownloader":"DC"}}';
```
  - Reference data is to be downloaded on local nodes for jobs. In this case, set the reference data directory as:
```
set hivevar:refereceDataDetails='{"referenceDataPathLocation":"HDFS",
"dataDir":"/home/data/dm/referenceData","dataDownloader":{"dataDownloader":
"HDFS","localFSRepository":"/local/download"}}';
```
- Reference data is on local path: Ensure that data is present on each node of the cluster on the same path.
  Set the reference directory as:
```
set hivevar:refereceDataDetails='{"referenceDataPathLocation":"LocaltoDataNodes",
"dataDir":"/home/data/referenceData"}';
```
Specify the configurations and other details for the job, and assign these to respective variables or configuration properties.
Note: The rule must be in JSON format.
For example,
```
set hivevar:rule='{"rules":[{"action":"Standardize", "source":"CityCode", "tableName":"State Name Abbreviations",
"lookupMultipleWordTerms":false, "lookupIndividualTermsWithinField":false, "destination":"CityCode"}]}';
```
Note: Ensure to use the configuration properties in the respective job configurations. For example, pb.bdq.match.rule, pb.bdq.match.express.column, pb.bdq.consolidation.sort.field, and so on where indicated in the respective sample HQL files.
Specify the header fields of the input table in comma-separated format, and assign to a variable or configuration property.
```
set hivevar:header ='AccountDescription,Address,ApartmentNumber,CityCode';
```

To run the job and display the job output on the console, write the query as indicated in this example:

SELECT bar.ret["StandardizationTermIdentified"],
	bar.ret["accountdescription"],
	bar.ret["address"],
	bar.ret["apartmentnumber"],
	bar.ret["citycode"] 
FROM (
	SELECT tablelookup(${hivevar:rule}, ${hiveconf:refdir}, ${hiveconf:header}, accountdescription, address, apartmentnumber, citycode) 
	AS ret 
	FROM citizen_data
	) bar;

To run the job and dump the job output in a designated file, write the query as indicated in this example:

INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/TableLookup/' row format delimited FIELDS TERMINATED BY ',' lines terminated by '\n' STORED AS TEXTFILE 
SELECT bar.ret["StandardizationTermIdentified"],
	bar.ret["accountdescription"],
	bar.ret["address"],
	bar.ret["apartmentnumber"],
	bar.ret["citycode"] 
FROM (
	SELECT tablelookup(${hiveconf:rule}, ${hiveconf:refdir}, ${hiveconf:header}, accountdescription, address, apartmentnumber, citycode) 
	AS ret 
	FROM citizen_data
	) bar;

Note: Use the alias defined earlier for the UDF.