DataEncoderDecoder
The DataEncoderDecoder gem allows you to encode or decode data in selected columns using a variety of standard techniques, including Base64, Hex, and AES encryption. You can transform values in-place or create new output columns with a prefix or suffix.
The DataEncoderDecoder gem has a corresponding interactive gem example. See Interactive gem examples to learn how to run sample pipelines for this and other gems.
Input and Output
The DataEncoderDecoder gem accepts the following input and output.
| Port | Description |
|---|---|
| in0 | Input dataset containing one or more string or binary columns to encode or decode. |
| out | Output dataset with transformed values. Output columns depend on the transformed column option. |
Parameters
Configure the DataEncoderDecoder gem using the following parameters.
| Parameter | Description |
|---|---|
| Select columns to encode/decode | One or more columns to apply the transformation to. |
| Select encode/decode option | The encoding or decoding method to apply. See Encode/Decode methods. |
| Transformed column options | Choose how the output should be written. See Transformed column options. |
Encode/Decode methods
Choose from the following methods:
base64
Encodes the selected column(s) using Base64.
unbase64
Decodes Base64-encoded column values.
hex
Encodes the selected column(s) into hexadecimal format.
unhex
Decodes hexadecimal-encoded values.
encode
Encodes the string column(s) using a specified character set.
- Charset: Character set to use, such as
UTF-8.
decode
Decodes the string column(s) using a specified character set.
- Charset: Character set to use, such as
UTF-8.
aes_encrypt
Encrypts the selected column(s) using AES encryption.
- Secret scope: The name of the Databricks secret scope.
- Secret key: The key name within the scope that stores the encryption key.
- Mode: AES encryption mode to use.
GCMCBCEBC
- (Optional) AAD scope and key: For GCM mode, you can specify Databricks scope and key for the AAD.
- (Optional) Initialization vector scope and key: For CBC mode, specify a Databricks secret scope and key for the IV.
Transformed column options
- Substitute the new columns in place: Replaces the original column(s) with the transformed values.
- Add new columns with a prefix/suffix attached: Adds a new column for each transformed input column, appending a prefix or suffix to the name.
Example
Assume you have the following dataset:
| ID | Message |
|---|---|
| 1 | Hello world! |
| 2 | Prophecy |
Using the base64 method, adding new columns with the suffix _encoded, the output would be:
| ID | Message | Message_encoded |
|---|---|---|
| 1 | Hello world! | SGVsbG8gd29ybGQh |
| 2 | Prophecy | UHJvcGhlY3k= |