Using a Global Address Validation Spark Job
-
Create an instance of
AddressValidationFactory
, using its static methodgetInstance()
. -
Provide the input and output details for the Global Address Validation job by
creating an instance of
AddressValidationDetail
specifying theProcessType
. The instance must use the type SparkProcessType. For this, the steps are:-
Create an instance of
productDatabaseInfo
, and set these details:- ReferenceDataPath: Use Enum ReferenceDataPathLocation
- CountryCode: Use Enum CountryCodes
- ProcessType: Use Enum AddressValidationProcessType
-
Create an array list class
ProductDatabaseInfoList
and use the add() method to insert theProductDatabaseInfo
. -
Create an instance of
AddressValidationEngineConfiguration
, and in this instance, set theProductDatabaseInfoList
. -
Create an instance of
AddressValidationInputOption
, and set these details to this new instance:Note: Use these enums: Enum AddressValidationInputOption.MatchMode, Enum CountryCodes, and Enum Casing.- Casing
- MatchMode
- DefaultCountry
- MaximumResults
- ReturnInputAddress
- ReturnParsedAddress
- ReturnPrecisionCode
- ReturnMatchScore
- MustMatchAddressNumber
- MustMatchStreet
- MustMatchCity
- MustMatchLocality
- MustMatchState
- MustMatchStateProvince
- MustMatchPostCode
- KeepMultiMatch
- PreferPostalOverCity
- CityFallback
- PostalFallback
- ValidationLevel
-
Create an instance of
AddressValidationDetail
, by passing the job configuration,addressValidationEngineConfiguration
, andinputOption
instance created earlier as the arguments to its constructor. To this instance, set these details:Note: TheConfig
parameter must be an instance of type MRJobConfig (for an MR job) and SparkJobConfig (for a Spark job).- Set the details of the input file using the
inputPath
field.Note:- For a text input file, create an instance of
FilePath
with the relevant details of the input file by invoking the appropriate constructor. - For an ORC input file, create an instance of
OrcFilePath
with the path of the ORC input file as the argument. - For a parquet input file, create an instance of ParquetFilePath with the path of the parquet input file as the argument.
- For a text input file, create an instance of
- Set the details of the output file using the
outputPath
field.Note:- For a text output file, create an instance of
FilePath
with the relevant details of the output file by invoking the appropriate constructor. - For an ORC output file, create an instance of
OrcFilePath
with the path of the ORC output file as the argument. -
For a parquet output file, create an instance of ParquetFilePath with the path of the parquet output file as the argument.
- For a text output file, create an instance of
- Set the name of the job using the
jobName
field. - Set the
compressOutput
flag to false to prevent compressing the output of the job.
- Set the details of the input file using the
-
Create an instance of
-
To create a Spark job, use the previously created instance of
AddressValidationFactory
to invoke its methodcreateJob()
. In this, pass the above instance ofAddressValidationDetail
as an argument.TherunSparkJob()
method runs the job and returns aMap
of the reporting counters of the job. - Display the counters to view the reporting statistics for the job.