RowDistributor
Spark Gem
Use the RowDistributor Gem to create multiple DataFrames based on provided filter conditions from an input DataFrame.
This is useful for cases where rows from the input DataFrame needs to be distributed into multiple DataFrames in different ways for downstream Gems.
Parameters
Parameter | Description | Required |
---|---|---|
DataFrame | Input DataFrame for which rows needs to be distributed into multiple DataFrames | True |
Filter Conditions | Boolean Type column or boolean expression for each output tab. Supports SQL, Python and Scala expressions | True |
Example
info
Number of outputs can be changed as needed by clicking the +
button.
Generated Code
- Python
- Scala
def RowDistributor(spark: SparkSession, in0: DataFrame) -> (DataFrame, DataFrame, DataFrame):
df1 = in0.filter((col("order_status") == lit("Started")))
df2 = in0.filter((col("order_status") == lit("Approved")))
df3 = in0.filter((col("order_status") == lit("Finished")))
return df1, df2, df3
object RowDistributor {
def apply(
spark: SparkSession,
in: DataFrame
): (DataFrame, DataFrame, DataFrame) =
(in.filter(col("order_status") === lit("Started")),
in.filter(col("order_status") === lit("Approved")),
in.filter(col("order_status") === lit("Finished"))
)
}