Skip to main content

DataEncoderDecoder

The DataEncoderDecoder gem allows you to encode or decode data in selected columns using a variety of standard techniques, including Base64, Hex, and AES encryption. You can transform values in-place or create new output columns with a prefix or suffix.

Input and Output

The DataEncoderDecoder gem accepts the following input and output.

PortDescription
in0Input dataset containing one or more string or binary columns to encode or decode.
outOutput dataset with transformed values. Output columns depend on the transformed column option.

Parameters

Configure the DataEncoderDecoder gem using the following parameters.

ParameterDescription
Select columns to encode/decodeOne or more columns to apply the transformation to.
Select encode/decode optionThe encoding or decoding method to apply. See Encode/Decode methods.
Transformed column optionsChoose how the output should be written. See Transformed column options.

Encode/Decode methods

Choose from the following methods:

base64

Encodes the selected column(s) using Base64.

unbase64

Decodes Base64-encoded column values.

hex

Encodes the selected column(s) into hexadecimal format.

unhex

Decodes hexadecimal-encoded values.

encode

Encodes the string column(s) using a specified character set.

  • Charset: Character set to use, such as UTF-8.

decode

Decodes the string column(s) using a specified character set.

  • Charset: Character set to use, such as UTF-8.

aes_encrypt

Encrypts the selected column(s) using AES encryption.

  • Secret scope: The name of the Databricks secret scope.
  • Secret key: The key name within the scope that stores the encryption key.
  • Mode: AES encryption mode to use.
    • GCM
    • CBC
    • EBC
  • (Optional) AAD scope and key: For GCM mode, you can specify Databricks scope and key for the AAD.
  • (Optional) Initialization vector scope and key: For CBC mode, specify a Databricks secret scope and key for the IV.

Transformed column options

  • Substitute the new columns in place: Replaces the original column(s) with the transformed values.
  • Add new columns with a prefix/suffix attached: Adds a new column for each transformed input column, appending a prefix or suffix to the name.

Example

Assume you have the following dataset:

IDMessage
1Hello world!
2Prophecy

Using the base64 method, adding new columns with the suffix _encoded, the output would be:

IDMessageMessage_encoded
1Hello world!SGVsbG8gd29ybGQh
2ProphecyUHJvcGhlY3k=