Defining Fields In an Output Sequence File

In the Write to Hadoop Sequence File stage, the Fields tab defines the names, positions, and types of fields in the file. After you define an input file on the File Properties tab you can define the fields.

  1. To select the desired fields from the input data, or an existing file, click Quick Add.
    1. Select the specific fields from the input data.
    2. Click OK.
  2. To add new fields, click Add.
    1. Enter the Name of the field.
    2. Select the Type of the field. The stage supports the following data types:
      boolean
      A logical type with two values: true and false.
      date
      A data type that contains a month, day, and year. For example, 2012-01-30 or January 30, 2012. You can specify a default date format in Management Console.
      datetime
      A data type that contains a month, day, year, and hours, minutes, and seconds. For example, 2012/01/30 6:15:00 PM.
      Note: In Parquet files, datetime and time datatypes are mapped as String. In RC files, datetime datatype is mapped as timestamp.
      double
      A numeric data type that contains both negative and positive double precision numbers between 2-1074 and (2-2-52)×21023. In E notation, the range of values is -1.79769313486232E+308 to 1.79769313486232E+308.
      float
      A numeric data type that contains both negative and positive single precision numbers between 2-149 and (2-223)×2127. In E notation, the range of values -3.402823E+38 to 3.402823E+38.
      integer
      A numeric data type that contains both negative and positive whole numbers between -231 (-2,147,483,648) and 231-1 (2,147,483,647).
      bigdecimal
      A numeric data type that supports 38 decimal points of precision. Use this data type for data that will be used in mathematical calculations requiring a high degree of precision, especially those involving financial data. The bigdecimal data type supports more precise calculations than the double data type.
      Note: For RC, Avro, and Parquet Hive files, the bigdecimal datatype is converted to a decimal datatype with precision 38 and scale 10.;
      long
      A numeric data type that contains both negative and positive whole numbers between -263 (-9,223,372,036,854,775,808) and 263-1 (9,223,372,036,854,775,807).
      Note: In RC files, the long datatype is mapped as bigint datatype.
      string
      A sequence of characters.
    3. In the Position field, enter the position of this field within the record.

      For example, in this input file, AddressLine1 is in position 1, City is in position 2, StateProvince is in position 3, and PostalCode is in position 4.

      "AddressLine1"|"City"|"StateProvince"|"PostalCode"
      "7200 13TH ST"|"MIAMI"|"FL"|"33144"
      "One Global View"|"Troy"|"NY"|12180
  3. If you're overwriting an existing file, click Regenerate to pick the schema from the existing file, and then modify it.
    This generates the schema based on the first 50 records in the input data to this stage.
  4. If you want to have any excess space characters removed from the beginning and end of a field's character string, select the Trim Spaces check box.
  5. Specify one of the following options to generate the key:
    Auto Generate

    The Hadoop cluster auto generates the key. For auto generation, all the fields in the grid are considered value fields. The data type of the key is long.

    User Defined

    By default, the first field in the grid is selected as the key field. An icon is displayed to indicate that the field is the key field. You can select any other field as the key field.

After defining the fields in your output file, you can edit its contents and layout.

Option Name

Description

Add

Adds a field to the output. You can append a field to the end of the existing layout, or you can insert a field into an existing position and the position of the remaining fields will be adjusted accordingly.

Modify

Modifies the field's name and type.

Remove

Removes the selected field from the output.

Move Up/Move Down

Reorders the selected field.