Developer

Join

Join component allows you to join combine the rows of two incoming datasets.

Note: If you wish to join more than two tables, use MultiJoin component.

Example Usage

In the Join component

  • You add the Join condition at the top
  • You select the Type of join
  • You provide the selected columns in the join, using left. and right. to specify the correct column from an incoming datasets
    • You can select columns by clicking columns from incoming tables in the left bar
    • You can write any expressions you can write in a SQL select or reformat component

.

Join Types

Let’s start with the following two tables

pets

Name Price
Cat 200
Dog 500
Squirrel 100

available

Name Amount
Cat 1
Dog 3
Tiger 0

Here are the outputs of various joins (with columns available to select after join):

pets.join(available, “inner”)

.

Name Price Name Amount
Cat 200 Cat 1
Dog 500 Dog 3

pets.join(available, “outer”)

.

Name Price Name Amount
Cat 200 Cat 1
Dog 500 Dog 3
Squirrel 100 null null
null null Tiger 0

pets.join(available, “leftouter”)

.

Name Price Name Amount
Cat 200 Cat 1
Dog 500 Dog 3
Squirrel 100 null null

pets.join(available, “rightouter”)

.

Name Price Name Amount
Cat 200 Cat 1
Dog 500 Dog 3
null null Tiger 0

pets.join(available, “leftsemi”)

.

Name Price
Cat 200
Dog 500

pets.join(available, “leftanti”)

.

Name Price
Squirrel 100

.

Example Code

The join code in the code editor is fairly simple. Since join is so often followed by selection of resulting columns, we have decided to add select in the component.


object CombinebyCustomer {

  def apply(spark: SparkSession, left: DataFrame, right: DataFrame): Join = {
    import spark.implicits._

    val leftAlias  = left.as("left")
    val rightAlias = right.as("right")
    val dfJoin     = leftAlias.join(rightAlias, col("right.customer_id") === col("left.customer_id"), "inner")

    val out = dfJoin.select(
      col("left.first_name").as("first_name"),
      col("left.last_name").as("last_name"),
      col("right.amount").as("amount"),
      col("left.customer_id").as("customer_id")
    )

    out

  }

}
    

# Coming soon!