Using a Transactional Match MapReduce Job
-
Create an instance of
AdvanceMatchFactory
, using its static methodgetInstance()
. -
Provide the input and output details for the Transactional Match job by
creating an instance of
TransactionalMatchDetail
specifying theProcessType
. The instance must use the type MRProcessType.-
Specify the column using which the records are to be grouped by
creating an instance of
GroupbyOption
.Use an instance of GroupbyMROption to specify the group-by column and the number of reducers required. -
Generate the matching rules for the job by creating an instance of
MatchRule
. -
Create an instance of
TransactionalMatchDetail
, by passing an instance of typeJobConfig
, theGroupbyOption
instance created, and theMatchRule
instance created above as the arguments to its constructor.TheJobConfig
parameter must be an instance of type MRJobConfig. -
Set the details of the input file using the
inputPath
field of theTransactionalMatchDetail
instance.- For a text input file, create an instance of
FilePath
with the relevant details of the input file by invoking the appropriate constructor. - For an ORC input file, create an instance of
OrcFilePath
with the path of the ORC input file as the argument. - For a Parquet input file, create an instance of ParquetFilePath with the path of the Parquet input file as the argument.
- For a text input file, create an instance of
-
Set the details of the output file using the
outputPath
field of theTransactionalMatchDetail
instance.- For a text output file, create an instance of
FilePath
with the relevant details of the output file by invoking the appropriate constructor. - For an ORC output file, create an instance of
OrcFilePath
with the path of the ORC output file as the argument. - For a Parquet output file, create an instance of ParquetFilePath with the path of the Parquet output file as the argument.
- For a text output file, create an instance of
-
Set the name of the job using the
jobName
field of theTransactionalMatchDetail
instance. -
Set the flag
returnUniqueCandidates
of theTransactionalMatchDetail
instance to true to return unique candidate records in the output. The default is true. -
Set the
compressOutput
flag of theTransactionalMatchDetail
instance to true to compress the output of the job. -
If the input data does not have match keys, you must specify the match
key settings to first run the Match Key Generator job to generate the
match keys, before running the Transactional Match job.
To generate the match keys for the input data, specify the match key settings by creating and configuring an instance of
MatchKeySettings
to generate a match key before performing the transactional matching. Set this instance using thematchKeySettings
field of theTransactionalMatchDetail
instance.Note: To see how to set match key settings, see the code samples.
-
Specify the column using which the records are to be grouped by
creating an instance of
-
To create a MapReduce job, use the previously created instance of
AdvanceMatchFactory
to invoke its methodcreateJob()
. In this, pass the above instance ofTransactionalMatchDetail
as an argument.ThecreateJob()
method creates the job and returns aList
of instances ofControlledJob
. -
Run the created job using an instance of
JobControl
. -
To display the reporting counters after successful MapReduce job run, use the
previously created instance of
AdvanceMatchFactory
to invoke its methodgetCounters()
, passing the created job as an argument.