Filter gem
ProphecySparkBasicsPython 0.0.1+ProphecySparkBasicsScala 0.0.1+UC Dedicated Cluster 14.3+UC Standard Cluster 14.3+Livy 3.0.1+
Filters a DataFrame based on the provided filter condition.
Parameters
Parameter | Description | Required |
---|---|---|
DataFrame | Input DataFrame on which the filter condition will be applied. | True |
Filter Condition | BooleanType column or boolean expression. Supports SQL, Python and Scala expressions. | True |
note
Use the visual language syntax to call configuration variables in the Filter gem.
Example
In this example, the Filter gem is used to return only marketing orders that are either finished or approved, while excluding any orders that have been discounted.
info
The Filter gem configuration translates into the Spark code shown below, which applies the same filtering logic.
Spark code
- Python
- Scala
def Filter_Orders(spark: SparkSession, in0: DataFrame) -> DataFrame:
return in0.filter(
(
((col("order_category") == lit("Marketing"))
& ((col("order_status") == lit("Finished")) | (col("order_status") == lit("Approved"))))
& ~ col("is_discounted")
)
)
object Filter_Orders {
def apply(spark: SparkSession, in: DataFrame): DataFrame =
in.filter(
(
col("order_category") === lit("Marketing"))
.and(
(col("order_status") === lit("Finished"))
.or(col("order_status") === lit("Approved"))
)
.and(!col("is_discounted"))
)
}