Flattening Complex XML Elements

Most stages in a dataflow require data to be in a flat format. This means that when you read hierarchical data from an XML file into a dataflow, you will have to flatten it if the data contains complex XML elements. A complex XML element is an element that contain other elements or attributes. For example, in the following data file the <address> element and the <account> element are complex XML elements:

<customers>
    <customer>
        <name>Sam</name>
        <gender>M</gender>
        <age>43</age>
        <country>United States</country>
        <address>
            <addressline1>1253 Summer St.</addressline1>
            <city>Boston</city>
            <stateprovince>MA</stateprovince>
            <postalcode>02110</postalcode>
        </address>
        <account>
            <type>Savings</type>
            <number>019922</number>
        </account>
    </customer>
    <customer>
        <name>Jeff</name>
        <gender>M</gender>
        <age>32</age>
        <country>Canada</country>
        <address>
            <addressline1>26 Wellington St.</addressline1>
            <city>Toronto</city>
            <stateprovince>ON</stateprovince>
            <postalcode>M5E 1S2</postalcode>
        </address>
        <account>
            <type>Checking</type>
            <number>238832</number>
        </account>
    </customer>
    <customer>
        <name>Mary</name>
        <gender>F</gender>
        <age>61</age>
        <country>Australia</country>
        <address>
            <addressline1>Level 7, 1 Elizabeth Plaza</addressline1>
            <city>North Sydney</city>
            <stateprovince>NSW</stateprovince>
            <postalcode>2060</postalcode>
        </address>
        <account>
            <type>Savings</type>
            <number>839938</number>
        </account>
    </customer>
</customers>

The following procedure describes how to use Splitter stages to flatten XML data containing multiple complex XML elements.

Note: If your data contains a single complex XML element, you can use a single Splitter stage to flatten the data by simply connecting the Read from XML stage to the Splitter stage. You do not need to use the Broadcaster and Record Combiner stages as described in this procedure for data files containing a single complex XML element.
  1. Add a Read from XML stage to your data flow and configure the stage. For more information, see Read From XML.
  2. Add a Broadcaster stage and connect Read from XML to it.
  3. Add a Splitter stage for each complex XML element in your data.
  4. Connect the Broadcaster stage to each Splitter.
  5. Add a Record Combiner stage and connect each Splitter to it.

    You should now have a data flow that looks like this:

  6. Double-click the first Splitter stage to open the stage options.
  7. In the Split at field, select one of the complex fields. In the example data file above, this could be the address field.
  8. Click OK.
  9. Configure each additional Splitter stage, selecting a different complex XML element in each Splitter's Split at field.

The data flow is now configured to take XML input containing records with complex XML elements and flatten the data. The resulting records from Record Combiner can be sent to any stage that requires flat data. For example, you could attached the Record Combiner stage to a Validate Address stage for address validation.