Ensure the Spectrum™ Data & Address Quality for Big Data SDK is installed on your machine.
You can run a Spectrum™ Data & Address Quality for Big Data SDK job using the module-specific JAR
files and the configuration files in XML formats.
For a list of the module-specific JAR files, see Components of the SDK Java API.
-
For a Linux system, open a command prompt.
For Windows and Unix systems, open an SSH client, such as
Putty.
-
For a MapReduce job, use the command
hadoop
.
Based on the job you wish to run:
- Pass the name of the JAR file of that module.
- Pass the driver class's name
RunMRSampleJob
.
- Pass the various configuration files as a list of arguments. Each
argument key accepts the path of a single configuration property file,
where each file contains multiple configuration properties.
The syntax of the command is:
hadoop jar <Name of module JAR
file> RunMRSampleJob [-config <Path to configuration file>] [-debug]
[-input <Path to input configuration file>] [-conf <Path to
MapReduce configuration file>] [-output <Path of output
directory>]
For example, for a MapReduce
MatchKeyGenerator job:
hadoop jar amm.core.12.2.jar RunMRSampleJob -config
/home/hadoop/matchkey/mkgConfig.xml -input
/home/hadoop/matchkey/inputFileConfig.xml -conf
/home/hadoop/matchkey/mapReduceConfig.xml -output
/home/hadoop/matchkey/outputFileConfig.xml
-
For a Spark job, use the command
spark-submit
.
Based on the job you wish to run:
- Pass the name of the JAR file of that module.
- Pass the driver class's name
RunSparkSampleJob
.
- Pass the various configuration files as a list of arguments. Each
argument key accepts the path of a single configuration property file,
where each file contains multiple configuration properties.
The syntax of the command is:
spark-submit –-class
RunSparkSampleJob <Name of module JAR file> [-config <Path to
configuration file>] [-debug] [-input <Path to input configuration
file>] [-conf <Path to Spark configuration file>] [-output <Path
of output directory>]
For example, for a Spark
MatchKeyGenerator job:
spark-submit --class RunSparkSampleJob
amm.core.12.2.jar -config
/home/hadoop/spark/matchkey/matchKeyGeneratorConfig.xml -input
/home/hadoop/spark/matchkey/inputFileConfig.xml -output
/home/hadoop/spark/matchkey/outputFileConfig.xml
Note: To see a list of argument keys supported for the
hadoop
or
spark-submit
commands, run the
commands:
hadoop --help
or
spark-submit --help