Skip to main content

BulkColumnExpressions

Spark Gem

The BulkColumnExpressions Gem primarily lets you cast or change the data type of multiple columns at once. It provides additional functionality, including:

  • Adding a prefix or suffix to selected columns.
  • Applying a custom expression to selected columns.

Parameters

ParameterDescription
Data Type of the columns to do operations onThe data type of columns to select.
Selected ColumnsThe columns on which to apply transformations.
Change output column nameAn option to add a prefix or suffix to the selected column names.
Change output column typeThe data type that the columns will be transformed into.
Output ExpressionA Spark SQL expression that can be applied to the selected columns. This field is required. If you only want to select the column, use column_value as the expression.

Example

Assume you have some columns in a table that represent zero-based indices and are stored as long data types. You want them to represent one-based indices and be stored as integers to optimize memory use.

Using the BulkColumnExpressions Gem, you can:

  • Filter your columns by long data types.
  • Select the columns you wish to transform.
  • Cast the output column(s) to be integers.
  • Include column_value + 1 in the expression field to shift the indices.