Using an Open Parser MapReduce Job
-
Create an instance of
DataNormalizationFactory
, using its static methodgetInstance()
. -
Provide the input and output details for the Open Parser job by creating an
instance of
OpenParserDetail
specifying theProcessType
. The instance must use the type MRProcessType.-
Configure the parsing rules by creating an instance of
OpenParserConfiguration
. In this instance, set the grammar file path. -
Set the details of the Reference Data Path and location type by
creating an instance of
ReferenceDataPath
. See Enum ReferenceDataPathLocation. -
Create an instance of
OpenParserDetail
, by passing an instance of typeJobConfig
, and theOpenParserConfiguration
andReferenceDataPath
instances created earlier as the arguments to its constructor.TheJobConfig
parameter must be an instance of type MRJobConfig. -
In the instance of the
OpenParserDetail
created above, set the details of the input file using theinputPath
field of theOpenParserDetail
instance.- For a text input file, create an instance of
FilePath
with the relevant details of the input file by invoking the appropriate constructor. - For an ORC input file, create an instance of
OrcFilePath
with the path of the ORC input file as the argument. - For a Parquet input file, create an instance of ParquetFilePath with the path of the Parquet input file as the argument.
- For a text input file, create an instance of
-
Set the details of the output file using the
outputPath
field of theOpenParserDetail
instance.- For a text output file, create an instance of
FilePath
with the relevant details of the output file by invoking the appropriate constructor. - For an ORC output file, create an instance of
OrcFilePath
with the path of the ORC output file as the argument. - For a Parquet output file, create an instance of ParquetFilePath with the path of the Parquet output file as the argument.
- For a text output file, create an instance of
-
Set the name of the job using the
jobName
field of theOpenParserDetail
instance.
-
Configure the parsing rules by creating an instance of
-
To create a MapReduce job, use the previously created instance of
DataNormalizationFactory
to invoke its methodcreateJob()
. In this, pass the above instance ofOpenParserDetail
as an argument.ThecreateJob()
method returns aList
of instances ofControlledJob
. -
Run the created job using an instance of
JobControl
.