Special Scenarios
Records with Blank Group-By Column
All records with a blank group-by value are marked as malformed records, and dumped in separate files in the output HDFS folder.These malformed files are named as:
Counters for Malformed Records- Malformed Records in Candidate Files
- Candidate file records with a blank group-by column are discarded as malformed
records and inserted into files with the file naming convention
malformedRecordsCandidate-m-<5 digit numeral>.
For example, malformedRecordsCandidate-m-00000, malformedRecordsCandidate-m-00001.
This applies to Interflow Match jobs.
- Malformed Records in Suspect Files
- Suspect file records with a blank group-by column are discarded as malformed records
and inserted into files with the file naming convention
malformedRecordsSuspect-m-<5 digit numeral>.
For example, malformedRecordsSuspect-m-00000, malformedRecordsSuspect-m-00001.
This applies to Interflow Match jobs.
- Malformed Records in Input Files
- Input file records with a blank group-by column are discarded as malformed records
and inserted into files with the file naming convention
malformedRecords-m-<5 digit numeral>.
For example, malformedRecords-m-00000, malformedRecords-m-00001.
This applies to the jobs Intraflow Match, Transactional Match, Best of Breed, Duplicate Synchronization, and Filter.
The number of malformed records
in a job run is stored in the counters:
- MALFORMED_CANDIDATE_RECORDS
- MALFORMED_SUSPECT_RECORDS
- MALFORMED_RECORDS
Note: The values in these counters can be accessed by invoking the
getCounters()
method of the AdvanceMatchFactory
instance.