Package

com.precisely.bigdata.addressing.spark

api

Permalink

package api

Provides classes to simplify usage of the java Addressing API. Example usage of the udf in spark:

object GeocodeExample {
  def main(args: Array[String]): Unit = {
    val session = SparkSession.builder().appName(this.getClass.getName).getOrCreate()

    //build a download manager capable of downloading remote resources to a node local path
    val downloadManager = new DownloadManagerBuilder("/addressing/downloads")
      .addDownloader(new S3Downloader(session.sparkContext.hadoopConfiguration))
      .addDownloader(new HDFSDownloader(session.sparkContext.hadoopConfiguration))
      .addDownloader(new LocalFilePassthroughDownloader())
      .build()

    //read the input records
    val input = session.read.option("header", true).csv(path = "hdfs:///addressing/input/")

    //build the udf
    val geocodeUdf: UserDefinedFunction = new AddressingBuilder()
      .withDownloadManager(downloadManager)
      .withResourcesLocation("hdfs:///addressing/resources/")
      .withDataLocations("hdfs:///addressing/reference_data/")
     .udfBuilder()
      //customize the preference to return extra information like pb_key
      .withPreferences(new PreferencesBuilder().withReturnAllInfo(true).build())
      .withOutputFields(
        "address.formattedStreetAddress as formattedStreetAddress",
        "address.formattedLocationAddress as formattedLocationAddress",
        "location.feature.geometry.coordinates.x as x",
        "location.feature.geometry.coordinates.y as y",
        "customFields['PB_KEY'] as 'PB_KEY'"
      )
      .withErrorField("error")
      .forGeocode()

    val output = input.withColumn("result", geocodeUdf(map(
      lit("addressLines[0]"), col("streetAddress"),
      lit("addressLines[1]"), col("locationAddress")
    )))
      //persist so that we don't run the udf for each field in the subsequent unrolling of the result field
      .persist()
      .select("*", "result.*").drop(colName = "result")

    output.write.mode(SaveMode.Overwrite).option("header", true).csv("hdfs:///addressing/output")
  }
}
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. api
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. class AddressingBuilder extends AnyRef

    Permalink

    This class allows you to either build an AddressingProvider for use in your code, or to branch off into a builder for a UDF that executes an addressing operation.

  2. trait AddressingExecutor extends Serializable

    Permalink

    An implementation of a custom Addressing API operation.

  3. trait AddressingProvider extends Serializable

    Permalink

    A serializable provider for the Addressing API.

    A serializable provider for the Addressing API. This instance can be used in spark tasks for direct Addressing API usage since it is serializable.

  4. trait RequestInput extends Serializable

    Permalink

    Provides access to the input for a given addressing operation, as well as the option to build a RequestAddress.

  5. class UDFBuilder extends AnyRef

    Permalink

    This class allows you to build a UDF that executes an addressing operation.

    This class allows you to build a UDF that executes an addressing operation. You can obtain an instance of this builder by starting with an AddressingBuilder.

Inherited from AnyRef

Inherited from Any

Ungrouped