Skip to main content

SetOperation

Spark Gem

Use the SetOperation gem to perform addition or subtraction of rows from DataFrames with identical schemas and different data.

Parameters

ParameterDescription
DataFrame 1First input DataFrame
DataFrame 2Second input DataFrame
DataFrame NNth input DataFrame
Operation type
  • Union: Returns a DataFrame containing rows in any one of the input DataFrames, while preserving duplicates.
  • Intersect All: Returns a DataFrame containing rows in all of the input DataFrames, while preserving duplicates.
  • Except All: Returns a DataFrames containing rows in the first DataFrame, but not in the other DataFrames, while preserving duplicates.

To add more input DataFrames, you can click the + icon on the left sidebar.

Set Operation - Add input dataframe

Examples

Operation Type: Union

Example usage of Set Operation - Union

def union(spark: SparkSession, in0: DataFrame, in1: DataFrame, ) -> DataFrame:
return in0.unionAll(in1)

Operation Type: Intersect All

Example usage of Set Operation - Intersect All

def intersectAll(spark: SparkSession, in0: DataFrame, in1: DataFrame, ) -> DataFrame:
return in0.intersectAll(in1)

Operation Type: Except All

Example usage of Set Operation - Except All

def exceptAll(spark: SparkSession, in0: DataFrame, in1: DataFrame, ) -> DataFrame:
return in0.exceptAll(in1)