Big Data Integration Module

New Activities

Run Hadoop MapReduce Job

The Run Hadoop MapReduce Job activity runs a MapReduce job on a Hadoop cluster. You can now run a MapReduce job using the Spectrumâ„¢ Big Data Quality SDK or any external MapReduce job.

Submit Spark Job

The Submit Spark Job activity runs a Spark job using the Spectrumâ„¢ Big Data Quality SDK or any external Spark job.

The driver of the Spark job can be run either on a Hadoop cluster or on a Spark cluster. To run a Spark job on a cluster host or on the client system, you can use YARN or Spark.

Spark Sorter

The new Spark Sorter activity uses Apache Spark libraries to sort massive amounts of records.

New Knox Gateway Data Source

You can now add connections to Knox-authenticated Hadoop services through Management Console. Once a Knox Gateway connection is created, you can use the connection to access data on the associated Hadoop clusters through the Knox gateway.