Configuration Files

Table 1. inputFileConfig
Parameter	Description
pb.bdq.input.type	Input file type. The values can be: `file`, `TEXT`, or `ORC`.
Suspect File
pb.bdq.match.suspect.inputfile.path	Path where you have placed the suspect input file on HDFS. Example: /user/hduser/sampledata/intermatch/ input/Interflow_Suspect.txt.
pb.bdq.match.suspect.recordseparator	Record delimiter used in the suspect file. For example, `LINUX`, `MACINTOSH`, or `WINDOWS`
pb.bdq.match.suspect.fieldseparator	Field or column delimiter used in the input file, such as comma (`,`) or tab.
pb.bdq.match.suspect.textqualifier	Text qualifiers, if any, in the columns or fields of the input file.
pb.bdq.match.suspect.header	Headers used in the suspect file. Example: `name`, `firstname`, `lastname`, `matchkey`, `middlename`, and `recordid`.
pb.bdq.match.suspect.skip.firstrow	If the first row is to be skipped from processing. The values can be `True` or `False`, where `True` indicates skip.
Candidate File
pb.bdq.match.candidate.inputfile.path	Path where you have placed the candidate input file on HDFS. Example: /user/hduser/sampledata/intermatch/ input/Interflow_candidate.txt.
pb.bdq.match.candidate.recordseparator	Record delimiter used in the candidate file. For example, `LINUX`, `MACINTOSH`, or `WINDOWS`
pb.bdq.match.candidate.fieldseparator	Field or column delimiter used in the input file, such as comma (`,`) or tab.
pb.bdq.match.candidate.textqualifier	Text qualifiers, if any, in the columns or fields of the input file.
pb.bdq.match.candidate.header	Headers used in the candidate file. Example: `name`, `firstname`, `lastname`, `matchkey`, `middlename`, and `recordid`.
pb.bdq.match.candidate.skip.firstrow	If the first row is to be skipped from processing. The values can be `True` or `False`, where `True` indicates skip.

Table 2. interMatchConfig
Parameter	Description
pb.bdq.job.type	This is a constant value that defines the job. The value for this job is: `InterMatch`.
pb.bdq.job.name	Name of the job. Default is `InterMatchSample`.
pb.bdq.match.rule	Json String for defining match rule. It specifies details, such as match rule hierarchy, matching method, method to score blank data in a field, scoring method, and algorithm to determine if the values in the field name matched.
pb.bdq.match.groupby	Name of the column to be used for grouping records in the match queue.
pb.bdq.reduce.count	Number of reducers to be run. Default is `1`.
pb.bdq.match.express.column	Name of the Express Match Column. If the content of this column matches between the suspect and the candidate, no further processing is needed to determine if the suspect and the candidates are duplicates.
pb.bdq.match.keygenerator.json	Json string for defining match key generator rule, such as whether to use expressMatchKey, name of the matchKeyField, and algorithm to be used. Note: This is an optional detail.
pb.bdq.match.unique.collectnumber.zero	A `true` value assigns collection number `0` to unique records.
pb.bdq.match.inter.comparison	Inter match comparison options. returnUniqueCandidates: Set the value to `true` to return unique records within a match group. maxNumOfDuplicates: Specify the maximum number of duplicates to be found before stopping the comparison in the match group. Note: This is an optional detail.

Table 3. mapReduceConfig
Specifies the MapReduce configuration parameters
Use this file to customize MapReduce parameters, such as mapreduce.map.memory.mb, mapreduce.reduce.memory.mb and mapreduce.map.speculative, as needed for your job.

Table 4. Output File Configuration
Parameter	Description
pb.bdq.output.type	Specify if the output is in: `file`, `TEXT`, or `ORC` format.
pb.bdq.outputfile.path	The path where you want the output to be generated on HDFS. For example, `/user/hduser/sampledata/intermatch/output`.
pb.bdq.outputformat.field.delimiter	Field or column delimiter in the output file, such as comma (`,`) or tab.
pb.bdq.output.overwrite	For a `true` value, the output folder is overwritten every time job is run.
pb.bdq.outputformat.headerfile.create	Specify `true`, if the output file needs to have a header.
pb.bdq.job.print.counters.console	If the counters are printed on console or in a file. `True` indicates counters are printed on the console
pb.bdq.job.counter.file.path	Path and the name of the file to which the counters are to be printed. You need to specify this if value in the pb.bdq.job.print.counters.console is `false`.