Ensure the Spectrum™ Data & Address Quality for Big Data SDK is installed on your machine.
-
Create a Java project to use the SDK as required using one of these
methods:
-
Create a specific Java project to run the required Data Quality
operation.
Using this method, you'll need to create separate Java projects for
each Data Quality job you wish to run.
-
Create a common Java project to run any of the desired Data Quality
operations using the corresponding runtime arguments.
Using this method, you'll need to create just one Java project which
accepts runtime arguments corresponding to the desired Data Quality
operation.
-
Import the Spectrum™ Data & Address Quality for Big Data SDK module-specific JAR file into your
project to use the SDK. For a list of the module-specific JAR files, see Components of the SDK Java API.
-
Import the required Hadoop JAR files into your project.
-
Create your application to run the desired Data Quality jobs, with appropriate
configurations.
-
Build your project, using any build tool like Maven or Ant.
A JAR file of your project is created as a result.
For example,
MatchKeyGeneratorClient-with-dependencies.jar
is
created.
-
Place your project's JAR file on the Hadoop platform.
-
On the Hadoop platform, in a command prompt, change the directory to the path
where you have placed your JAR file.
-
Run the JAR of your project using the command:
hadoop jar <name of the JAR of your client project> <fully qualified name of the main class>
For
example:
hadoop jar MatchKeyGeneratorClient-with-dependencies.jar com.company.bdq.amm.mr.MatchKeyGeneratorJob
The desired job is created and executed on the Hadoop platform. Your Java
application accesses the input data from the path specified on the Hadoop platform,
and creates and runs the job on the Hadoop platform. The output of the job is dumped
into a file at the specified output path on the Hadoop platform.