Using a Validate Address Global MapReduce Job
-
Create an instance of
GlobalAddressingFactory
, using its static methodgetInstance()
. -
Provide the input and output details for the Validate Address Global job by
creating an instance of
GlobalAddressingDetail
specifying theProcessType
. The instance must use the type MRProcessType. For this, the steps are:-
Configure the JVM initialization settings by creating an instance of
GlobalAddressingGeneralConfiguration
. -
Set the details of the Reference Data path by creating an instance of
ReferenceDataPath
. See Enum ReferenceDataPathLocation. -
Configure the necessary database settings by creating an instance of
GlobalAddressingEngineConfiguration
by passing the aboveReferenceDataPath
instance as an argument.- Set the preloading type in this instance using the enum Enum PreloadingType.
- Set the database type using the Enum DatabaseType.
- Set the supported countries using the Enum CountryCodes.
- If all countries are supported, set the
isAllCountries
attribute to true. Else, specify the comma-separated list of Enum CountryCodes values in thesupportedCountries
String value.
-
Configure the input settings by creating an instance of
GlobalAddressingInputConfiguration
.To set the values of the various fields of this instance, use the enums Enum CountryCodes, Enum StateProvinceType, Enum CountryType, Enum PreferredScript, Enum PreferredLanguage, Enum Casing, Enum OptimizationLevel, Enum Mode, and Enum MatchingScope as applicable. -
Set the unlock key for the data as a
String
value in aList
. -
Create an instance of
GlobalAddressingDetail
, by passing an instance of typeConfig
, theList
of unlock code values, theGlobalAddressingEngineConfiguration
instance, and theGlobalAddressingInputConfiguration
instance created earlier as the arguments to its constructor.The
Config
parameter must be an instance of type MRJobConfig.The value of
GROUPBY_REGION
in this parameter is set totrue
by default. The jobs process the addresses of those regions for which you have added the reference data. For example, the input addresses of Germany are processed if reference data of Germany is placed on HDFS.- Set the JVM
initialization configurations by setting the
generalConfiguration
field of theGlobalAddressingDetail
instance to theGlobalAddressingGeneralConfiguration
instance created above. - Set the details of the input file using the
inputPath
field of theGlobalAddressingDetail
instance.Note:- For a text input file, create an instance of
FilePath
with the relevant details of the input file by invoking the appropriate constructor. - For an ORC input file, create an instance of
OrcFilePath
with the path of the ORC input file as the argument. - For a parquet input file, create an instance of ParquetFilePath with the path of the parquet input file as the argument.
- For a text input file, create an instance of
- Set the details of the output file using the
outputPath
field of theGlobalAddressingDetail
instance.Note:- For a text output file, create an instance of
FilePath
with the relevant details of the output file by invoking the appropriate constructor. - For an ORC output file, create an instance of
OrcFilePath
with the path of the ORC output file as the argument. -
For a parquet output file, create an instance of ParquetFilePath with the path of the parquet output file as the argument.
- For a text output file, create an instance of
- Set the name of the job using the
jobName
field of theGlobalAddressingDetail
instance.
- Set the JVM
initialization configurations by setting the
-
Configure the JVM initialization settings by creating an instance of
-
To create a MapReduce job, use the previously created instance of
GlobalAddressingFactory
to invoke its methodcreateJob()
. In this, pass the above instance ofGlobalAddressingDetail
as an argument.ThecreateJob()
method returns aList
of instances ofControlledJob
. -
Run the created job using an instance of
JobControl
. -
To display the reporting counters post a successful MapReduce job run, use the
previously created instance of
GlobalAddressingFactory
to invoke its methodgetCounters()
, passing the created job as an argument.