Configuration Files

These tables describe the parameters and the values you need to specify before you run the Match Key Generator job.

Table 1. inputFileConfig
Parameter	Description
pb.bdq.input.type	Input file type. The values can be: `file`, `TEXT`, or `ORC`.
pb.bdq.inputfile.path	The path where you have placed the input file on HDFS. For example, /user/hduser/sampledata/matchkeygenerator/ input/MatchKey_Input.csv.
textinputformat.record.delimiter	File record delimiter used in the text type input file. For example, `LINUX`, `MACINTOSH`, or `WINDOWS`
pb.bdq.inputformat.field.delimiter	Field or column delimiter used in the input file, such as comma (`,`) or tab.
pb.bdq.inputformat.text.qualifier	Text qualifiers, if any, in the columns or fields of the input file.
pb.bdq.inputformat.file.header	Column headers as comma-separated values. For example, businessname, id, and domain.
pb.bdq.inputformat.skip.firstrow	If the first row is to be skipped from processing. The values can be `True` or `False`, where `True` indicates skip.

Table 2. mapReduceConfig
Specifies the MapReduce configuration parameters
Use this file to customize MapReduce parameters, such as mapreduce.map.memory.mb, mapreduce.reduce.memory.mb and mapreduce.map.speculative, as needed for your job.

Table 3. matchKeyGeneratorConfig
Parameter	Description
pb.bdq.job.type	This is a constant value that defines the job. The value for this job is: `MatchKeyGen`.
pb.bdq.job.name	Name of the job. Default is `MatchKeySample`.
pb.bdq.match.keygenerator.json	Json String for match key generator rules, such as algorithm to be used to generate the match key, field to which you want to apply the selected algorithm, starting position within the specified field, length of characters to include from the starting position, if non-numeric and non-alpha characters are to be removed, and if the input fields are to be sorted.

Table 4. outputFileConfig
Parameter	Description
pb.bdq.output.type	Output file type. The values can be: `file`, `TEXT`, or `ORC`.
pb.bdq.outputfile.path	The path where you want the output file to be generated on HDFS.
pb.bdq.outputformat.field.delimiter	Field or column delimiter in the output file, such as comma (`,`) or tab.
pb.bdq.output.overwrite	For a `true` value, the output folder is overwritten every time job is run.
pb.bdq.outputformat.headerfile.create	Specify `true`, if the output file needs to have a header.