Big Data Integration Module
New Activities
Run Hadoop MapReduce Job
The Run Hadoop MapReduce Job activity runs a MapReduce job on a Hadoop cluster. You can now run a MapReduce job using the Spectrumâ„¢ Big Data Quality SDK or any external MapReduce job.
Submit Spark Job
The Submit Spark Job activity runs a Spark job using the Spectrumâ„¢ Big Data Quality SDK or any external Spark job.The driver of the Spark job can be run either on a Hadoop cluster or on a Spark cluster. To run a Spark job on a cluster host or on the client system, you can use YARN or Spark.
Spark Sorter
The new Spark Sorter activity uses Apache Spark libraries to sort massive amounts of records.New Knox Gateway Data Source
You can now add connections to Knox-authenticated Hadoop services through Management Console. Once a Knox Gateway connection is created, you can use the connection to access data on the associated Hadoop clusters through the Knox gateway.