Load to Hive

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. To use Hive to query the underlying data source, use its own query language, HiveQL.

Hive supports the below Hadoop file formats:
  • TEXTFILE
  • SEQUENCE FILE
  • ORC
  • RCFILE
  • PARQUET
  • AVRO
    Note: The AVRO file format is supported in Hive version 0.14 and higher.

The Load to Hive activity allows you to load data into a Hive table using a JDBC connection. Using this connection, data is read from a specified Hadoop file and loaded to either an existing table of a selected connection, or to a newly created table in the selected connection.

To load the data to a new table, the schema of the table needs to be defined.
Note: Spectrum does not support hierarchical data, even though Hive supports it.