Using Iteration with an Embedded Dataflow

Iteration settings specify how an embedded dataflow should process incoming records. By default, an embedded dataflow processes each record individually just as any other stage in the dataflow would. But if you use iteration, you can process groups of records together, which can be useful for things like performing comparisons or calculations based on groups of records rather then the entire set of input data. You can also use iteration to set stage options based on the data in each record.

There are two kinds of iteration: per-record iteration and per-group iteration. In per-record iteration, an embedded dataflow process one record at a time and the result is sent along to the next stage following the embedded dataflow. Per-record iteration is useful if you want to set stage options on a record-by-record basis using field values.

In per-group iteration, records are grouped by a key field and the embedded dataflow processes each group. All the records in a group are processed in one iteration, then the group is written to the next stage following the embedded subflow. Use per-group iteration to perform processing on groups of related records, as well as to set stage options to use when processing the group of records. For example, you might want to group records by customer ID so that you can perform an analysis of each customer's records, perhaps to determine which store each customer visits most often.

You should consider the impact on performance when using iteration. Each time a new iteration starts, there is some overhead during the initialization of the embedded dataflow, and this overhead can be significant, especially if you have embedded dataflows within other embedded dataflows. For example, if the an embedded dataflow iterates 1,000 times and it contains within it another embedded dataflow that also iterates 1,000 times, the total number of iterations would be 1,000,000. Using per-record iteration has a more significant impact on performance since each record kicks off a new iteration.

  1. Create an embedded dataflow containing the stage or stages that you use for iteration.
    Note: There are some limitations to what can be included in embedded dataflows that have iteration enabled:
    • The Stream Combiner stage cannot be the first stage in an embedded dataflow that has iteration enabled.
    • The embedded dataflow cannot contain a sink that writes to a file located on the client. Sinks inside an embedded dataflow must write to a file on the Spectrumâ„¢ Technology Platform server or on a file server.
  2. Double-click the embedded dataflow icon.
  3. Check the Enable iteration check box.
  4. If there is more than one input channel connected to the embedded dataflow, use the Port field to choose the port whose records you want to use to drive iteration.

    For example, say you have two input ports, A and B, and you choose to iterate each time a key field changes. If you choose to use port B for iteration, the embedded dataflow will start a new iteration each time a key field in the records from port B changes. All the records from the other port, port A, will be read into the embedded dataflow, cached, and used for each iteration.

  5. Select the type of iteration you want to perform.
    Iterate each time a key field changes
    In this type of iteration, the embedded dataflow processes groups of records that have the same value in one or more fields. When the embedded dataflow finishes processing the group of records, the embedded dataflow resets and a new group of records is processed. Use this type of iteration to create embedded dataflows that process groups of records and then outputs each record group separately.
    Tip: If you choose this type of iteration, you can improve performance by placing a Sorter stage in front of the embedded dataflow and sorting the records by the key field.
    Iterate per record
    In this type of iteration, the embedded dataflow processes one record at a time. Every time one record completes the embedded dataflow processing, the result is sent to the output and a new record is processed. Embedded dataflows that iterate per record handle each record as a new dataflow execution.
  6. If you choose Iterate each time a key field changes, check the box Ignore case when comparing values if you want to ignore differences in case when evaluating key field values to determine record groups.
  7. Specify one or more key fields.
    1. Click Add.
    2. Choose the field you want to use as a key field.
    3. If you want to use the field's value to set a stage option within the embedded dataflow, specify the name of the option you want to set.
    4. Click OK.
    5. Add additional key fields if needed.

      If you have more than one key field and you chose the option Iterate each time a key field changes, records must contain the same value in all key fields to be grouped together.