Using a Match Key Generator Spark Job
-
Create an instance of
AdvanceMatchFactory
, using its static methodgetInstance()
. -
Provide input and output details for the Match Key Generator job by creating an
instance of
MatchKeyGeneratorDetail
specifying theProcessType
. The instance must use the type SparkProcessType.-
Specify the match key settings to perform the matching by creating and
configuring an instance of
MatchKeySettings
. For more information, see the relevant code sample. -
Create an instance of
MatchKeyGeneratorDetail
by passing an instance of typeJobConfig
and theMatchKeySettings
instance created as arguments to its constructor.TheJobConfig
parameter must be an instance of type SparkJobConfig. -
Set the details of the input file using the
inputPath
field of theMatchKeyGeneratorDetail
instance.- For a text input file, create an instance of
FilePath
with the relevant details of the input file by invoking the appropriate constructor. - For an ORC input file, create an instance of
OrcFilePath
with the path of the ORC input file as the argument. - For a Parquet input file, create an instance of ParquetFilePath with the path of the Parquet input file as the argument.
- For a text input file, create an instance of
-
Set the details of the output file using the
outputPath
field of theMatchKeyGeneratorDetail
instance.- For a text output file, create an instance of
FilePath
with the relevant details of the output file by invoking the appropriate constructor. - For an ORC output file, create an instance of
OrcFilePath
with the path of the ORC output file as the argument. - For a Parquet output file, create an instance of ParquetFilePath with the path of the Parquet output file as the argument.
- For a text output file, create an instance of
-
Set the name of the job using the
jobName
field of theMatchKeyGeneratorDetail
instance.
-
Specify the match key settings to perform the matching by creating and
configuring an instance of
-
To create and run the Spark job, use the previously created instance of
AdvanceMatchFactory
to invoke its methodrunSparkJob()
. In this, pass the above instance ofMatchKeyGeneratorDetail
as an argument.TherunSparkJob()
method runs the job and returns aMap
of the reporting counters of the job.