Skip to main content

Schema Transform

SchemaTransform is used to add, edit, rename or drop columns from the incoming DataFrame.

info

Unlike Reformat which is a set operation where all the transforms are applied in parallel, transformations here are applied in order. Reformat is a SQL select and is preferable when making many changes.

Parameters

ParameterDescriptionRequired
DataFrameInput DataFrameTrue
OperationAdd/Replace Column, Rename Column and Drop ColumnRequired if a transformation is added
New ColumnOutput column name (when Add/Replace operation is selected)Required if Add/Replace Column is selected
ExpressionExpression to generate new column (when Add/Replace operation is selected)Required if Add/Replace Column is selected
Old Column NameColumn to be renamed (when Rename operation is selected)Required if Rename Column is selected
New Column NameOutput column name (when Rename operation is selected)Required if Rename Column is selected
Column to dropColumn to be dropped (when Drop operation is selected)Required if Drop Column is selected

Example

Example usage of SchemaTransform

Spark Code

def transform(spark: SparkSession, in0: DataFrame) -> DataFrame:
return in0\
.withColumn("business_date", to_date(lit("2022-05-05"), "yyyy-MM-dd"))\
.withColumnRenamed("bonus_rate", "bonus")\
.drop("slug")