DataEncoderDecoder
The DataEncoderDecoder gem allows you to encode or decode data in selected columns using a variety of standard techniques, including Base64, Hex, and AES encryption. You can transform values in-place or create new output columns with a prefix or suffix.
Input and Output
The DataEncoderDecoder gem accepts the following input and output.
Port | Description |
---|---|
in0 | Input dataset containing one or more string or binary columns to encode or decode. |
out | Output dataset with transformed values. Output columns depend on the transformed column option. |
Parameters
Configure the DataEncoderDecoder gem using the following parameters.
Parameter | Description |
---|---|
Select columns to encode/decode | One or more columns to apply the transformation to. |
Select encode/decode option | The encoding or decoding method to apply. See Encode/Decode methods. |
Transformed column options | Choose how the output should be written. See Transformed column options. |
Encode/Decode methods
Choose from the following methods:
base64
Encodes the selected column(s) using Base64.
unbase64
Decodes Base64-encoded column values.
hex
Encodes the selected column(s) into hexadecimal format.
unhex
Decodes hexadecimal-encoded values.
encode
Encodes the string column(s) using a specified character set.
- Charset: Character set to use, such as
UTF-8
.
decode
Decodes the string column(s) using a specified character set.
- Charset: Character set to use, such as
UTF-8
.
aes_encrypt
Encrypts the selected column(s) using AES encryption.
- Secret scope: The name of the Databricks secret scope.
- Secret key: The key name within the scope that stores the encryption key.
- Mode: AES encryption mode to use.
GCM
CBC
EBC
- (Optional) AAD scope and key: For GCM mode, you can specify Databricks scope and key for the AAD.
- (Optional) Initialization vector scope and key: For CBC mode, specify a Databricks secret scope and key for the IV.
Transformed column options
- Substitute the new columns in place: Replaces the original column(s) with the transformed values.
- Add new columns with a prefix/suffix attached: Adds a new column for each transformed input column, appending a prefix or suffix to the name.
Example
Assume you have the following dataset:
ID | Message |
---|---|
1 | Hello world! |
2 | Prophecy |
Using the base64
method, adding new columns with the suffix _encoded
, the output would be:
ID | Message | Message_encoded |
---|---|---|
1 | Hello world! | SGVsbG8gd29ybGQh |
2 | Prophecy | UHJvcGhlY3k= |