User-defined functions
Allows you to create user defined functions (UDF) which are then usable anywhere in the Pipeline
Parameters
Parameter | Description | Required |
---|---|---|
UDF Name | Name of the UDF to be used to register it. All calls to the UDF will use this name | True |
Definition | Definition of the UDF function. Eg: udf((value:Int)=>value*value) | True |
UDF initialization code | Code block that contains initialization of entities used by UDFs. This could for example contain any static mapping that a UDF might use | False |
Examples
Defining and Using UDF
Step 1 - Open UDF definition window
- Python
- Scala
country_code_map = {"Mexico" : "MX", "USA" : "US", "India" : "IN"}
def registerUDFs(spark: SparkSession):
spark.udf.register("get_country_code", get_country_code)
@udf(returnType = StringType())
def get_country_code(country: str):
return country_code_map.get(country, "Not Found")
object UDFs extends Serializable {
val country_code_map = Map("Mexico" -> "MX", "USA" -> "US", "India" -> "IN")
def registerUDFs(spark: SparkSession) =
spark.udf.register("get_country_code", get_country_code)
def get_country_code =
udf { (country: String) =>
country_code_map.getOrElse(country, "Not Found")
}
}