Lookup
Lookup gems allow you to mark a particular DataFrame as a Broadcast DataFrame. Spark ensures that this data is available on every computation node so you can perform lookups without shuffling data. This is useful for looking up values in tables.
Lookups are implemented as user-defined functions under the hood in Prophecy. Learn about UDF support in Databricks on our documentation on cluster access modes.
Parameters
| Parameter | Description |
|---|---|
| Range Lookup | Whether to perform a lookup with a minimum and maximum value in a column. |
| Key Columns | One or more columns to use as the lookup key in the source DataFrame. |
| Value Columns | Columns to reference wherever you use this Lookup gem. |
Use a Lookup gem
After creating a Lookup gem, you can use the lookup in other gem expressions.
Column-based lookups
Assume you created this Lookup gem with the following configurations:

To perform a column-based lookup, use:
- Python
- Scala
- SQL
lookup("MyLookup", col("customer_id")).getField("order_category")
lookup("MyLookup", col("customer_id")).getField("order_category")
MyLookup(customer_id)['order_category']
Assume you also have the following Reformat component:

Here, you have a column named category that is set to the value of MyLookup(c_id)['order_category'] in SQL Expression mode. Whatever the value of order_category is for the key found in the c_id column, which is compared to the source customer_id key column, the Lookup gem uses it for a new column.
Literal lookups
You can use any column reference in a Lookup expression. This means that you can use Lookups with static keys:
- Python
- Scala
- SQL
lookup("MyLookup", lit("0000")).getField("order_category")
lookup("MyLookup", lit("0000")).getField("order_category")
MyLookup('0000')['order_category']
This expression evaluates to the value of order_category where customer_id is 0000. This is useful in situations where you want to have a table of predefined keys and their values available in expressions.