Using Configuration Property Files

Ensure the Spectrum™ Data & Address Quality for Big Data SDK is installed on your machine.

You can run a Spectrum™ Data & Address Quality for Big Data SDK job using the module-specific JAR files and the configuration files in XML formats.

For a list of the module-specific JAR files, see Components of the SDK Java API.

For a Linux system, open a command prompt.
For Windows and Unix systems, open an SSH client, such as Putty.
For a MapReduce job, use the command hadoop.
Based on the job you wish to run:
1. Pass the name of the JAR file of that module.
2. Pass the driver class's name RunMRSampleJob.
3. Pass the various configuration files as a list of arguments. Each argument key accepts the path of a single configuration property file, where each file contains multiple configuration properties.
The syntax of the command is:
hadoop jar <Name of module JAR file> RunMRSampleJob [-config <Path to configuration file>] [-debug] [-input <Path to input configuration file>] [-conf <Path to MapReduce configuration file>] [-output <Path of output directory>]

For example, for a MapReduce MatchKeyGenerator job:
hadoop jar amm.core.12.2.jar RunMRSampleJob -config /home/hadoop/matchkey/mkgConfig.xml -input /home/hadoop/matchkey/inputFileConfig.xml -conf /home/hadoop/matchkey/mapReduceConfig.xml -output /home/hadoop/matchkey/outputFileConfig.xml
For a Spark job, use the command spark-submit.
Based on the job you wish to run:
1. Pass the name of the JAR file of that module.
2. Pass the driver class's name RunSparkSampleJob.
3. Pass the various configuration files as a list of arguments. Each argument key accepts the path of a single configuration property file, where each file contains multiple configuration properties.
The syntax of the command is:
spark-submit –-class RunSparkSampleJob <Name of module JAR file> [-config <Path to configuration file>] [-debug] [-input <Path to input configuration file>] [-conf <Path to Spark configuration file>] [-output <Path of output directory>]

For example, for a Spark MatchKeyGenerator job:
spark-submit --class RunSparkSampleJob amm.core.12.2.jar -config /home/hadoop/spark/matchkey/matchKeyGeneratorConfig.xml -input /home/hadoop/spark/matchkey/inputFileConfig.xml -output /home/hadoop/spark/matchkey/outputFileConfig.xml

Note: To see a list of argument keys supported for the hadoop or spark-submit commands, run the commands:

hadoop --help

spark-submit --help