Extracting Reference Data for UAM Jobs

This section describes the process for fetching and extracting various reference data sources for validate address jobs. The reference data is fetched from the e-Store.

Note: For the Validate Address and Validate Address Global jobs, the Reference data must be placed on all the data nodes of Hadoop cluster or Hadoop Distributed File System (HDFS). For the Validate Address Loqate job, it must be placed at one node and that further needs to be mounted to all other data nodes.

Validate Address Loqate

Fetch reference data from the e-Store.
Extract the contents of the ZIP file.
Place the extracted files on one node and mount it further to all the other data nodes.

The files are now ready to be used in different map reduce and spark jobs and user defined functions.

Validate Address Global - Address Doctor

Fetch the reference data from the e-Store. In case of Validate Address Global, the reference data is available in these six data bundles:
- UAM - Enhanced International - Americas - Bundle Data
- UAM - Enhanced International - Americas - Bundle Data 2
- UAM - Enhanced International - EMEA - Bundle Data
- UAM - Enhanced International - EMEA - Bundle Data 2
- UAM - Enhanced International - APAC - Bundle Data
- UAM - Enhanced Int - US Cert Subscription
Download the ZIP files and place those on HDFS. Do not extract the ZIP files for placing on HDFS.
Note: To place the reference data on local nodes, extract the zip files and place it on all the data nodes. The path needs to be same for all the data nodes.

The files are now ready to be used in different map reduce and spark jobs and user defined functions.

Validate Address - C1P

Fetch these reference data bundles from the e-Store.
- US_SUB
- DPV
- EWS
- LACS
- SUITE
Extract the data through
- An interactive utility using the script installdb_unc.sh (See Extraction through interactive utility)
- A silent script silentInstalldb_unc.sh. (See Extraction using silent script)

The data gets extracted to the local or edge node, from where, it can be pushed to HDFS to be used in map reduce and spark jobs and Hive user defined functions.