Creating a Hive Connection

  1. Open the Load to Hive activity.
  2. From the File name field, enter the name of the file, which is to be read. Click Browse [...] to select the file to be read.
  3. From the File type field, select the format of the file to be read. The default file format selected is Delimited.
    If the File type selected is either Delimited or Sequence, the fields Field Separator and Record Separator are displayed. Else, they are not displayed.
  4. In the Field Separator field, select the character that separates each consecutive field of a record.
  5. Select the connection for the Hive database you want to use in the Connection field.
    1. To add, modify, and delete connections, click Manage.
      The Database Connection Manager window opens.
    2. Click Add to create a new connection, or Modify to edit an existing connection.
      The Connection Properties window opens.
    3. Enter the Connection Name.
    4. In the Database Driver field, select a Hive database driver for the connection.
    5. Specify all the details of the connection, namely user, password, host, port, and instance.
    6. To test the connection details, click Test.
    7. If the connection test is successful, click OK.
      The Connection Properties window closes.
    8. Click OK.
      The Database Connection Manager window closes.
  6. In the Table/View field, select the table you wish to write to, or type in the name of a new table to be created.
    If you create a new table in the Table/View field, the External checkbox gets enabled. Else, if you select an existing table, the External checkbox remains disabled.
  7. To create the new table external to the Hive database, check the External checkbox.
    Important: In case of External tables:
    1. Existing records cannot be overwritten, and new records cannot be added. You are only allowed to create new external tables and populate them with records.
    2. If you select a file placed in a particular folder, all files placed in that folder are automatically selected. Hence, ensure all files placed in the particular folder have the same format.
    Learn more about Hive EXTERNAL tables here.
  8. To overwrite all existing records of the table, check the Overwrite checkbox. This deletes existing records of the selected table, and adds the records read from the file to the table.
  9. The grid displays the names and datatypes of the columns of the selected table.
    If you have specified a new table in the Table/View field, use the Add, Modify and Remove buttons beside the grid to add columns to define the table, and specify their respective datatypes. Use the Move Up and Move Down buttons to specify the sequence of the table columns.
    Note: The Add, Modify, Remove, Move Up, and Move Down buttons remain disabled if you select an existing table in the Table/View field.
    Important:
    1. Ensure the datatypes of all the fields in the file match the datatypes of the respective table columns, unless all datatypes are of String type. Else, the data load may result in inconsistent data.
    2. Ensure the number of fields in the file match the number of table columns. Else, the data in the extra fields in the file are discarded.
    3. Hive accepts names of tables and columns in small case only. If you enter the names using block letters, Hive converts them to small case. The resultant schema displays all names in small case.
  10. Click OK.
Note: If you opt to create a new table and define its columns, the same is created at runtime. The Load to Hive activity is only to design the table structure. At runtime, the designed table is created and the data read from the file is written into it.