Extracting Reference Data for UAM Jobs
This section describes the process for fetching and extracting various reference data sources
for validate address jobs. The reference data is fetched from the e-Store.
Note: For the
Validate Address and Validate Address Global jobs, the Reference data must be placed on all
the data nodes of Hadoop cluster or Hadoop Distributed File System (HDFS). For the Validate
Address Loqate job, it must be placed at one node and that further needs to be mounted to
all other data nodes.
Validate Address Loqate
- Fetch reference data from the e-Store.
- Extract the contents of the ZIP file.
- Place the extracted files on one node and mount it further to all the other data nodes.
The files are now ready to be used in different map reduce and spark jobs and user defined functions.
Validate Address Global - Address Doctor
- Fetch the reference data from the e-Store. In case of Validate Address
Global, the reference data is available in these six data bundles:
- UAM - Enhanced International - Americas - Bundle Data
- UAM - Enhanced International - Americas - Bundle Data 2
- UAM - Enhanced International - EMEA - Bundle Data
- UAM - Enhanced International - EMEA - Bundle Data 2
- UAM - Enhanced International - APAC - Bundle Data
- UAM - Enhanced Int - US Cert Subscription
- Download the ZIP files and place those on HDFS. Do not extract the
ZIP files for placing on HDFS.Note: To place the reference data on local nodes, extract the zip files and place it on all the data nodes. The path needs to be same for all the data nodes.
The files are now ready to be used in different map reduce and spark jobs and user defined functions.
Validate Address - C1P
- Fetch these reference data bundles from the e-Store.
- US_SUB
- DPV
- EWS
- LACS
- SUITE
- Extract the data through
- An interactive utility using the script
installdb_unc.sh
(See Extraction through interactive utility) - A silent script
silentInstalldb_unc.sh
. (See Extraction using silent script)
- An interactive utility using the script
The data gets extracted to the local or edge node, from where, it can be pushed to HDFS to be used in map reduce and spark jobs and Hive user defined functions.