Load to Hive
Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. To use Hive to query the underlying data source, use its own query language, HiveQL.
Hive supports the below Hadoop file formats:
- TEXTFILE
- SEQUENCE FILE
- ORC
- RCFILE
- PARQUET
- AVRONote: The AVRO file format is supported in Hive version 0.14 and higher.
The Load to Hive activity allows you to load data into a Hive table using a JDBC connection. Using this connection, data is read from a specified Hadoop file and loaded to either an existing table of a selected connection, or to a newly created table in the selected connection.
To load the data to a new table, the schema of the table needs to be defined.
Note: Spectrum
does not support hierarchical data, even though Hive supports it.