Skip to main content

RecordID gem

Assigning unique identifiers to each row in a dataset is a common requirement for data preparation. The RecordID gem allows you to easily generate row-level IDs using two methods: UUID for randomly generated values, and Incremental ID for ordered, sequential values. You can customize how IDs are added, including naming the column, setting the data type and format, and specifying where the column appears in the output schema.

Input and Output

The RecordID gem accepts the following input and generates one output.

PortDescription
in0Input dataset containing the records you wish to assign IDs to.
outOutput dataset with a new record ID column.

Parameters

Review the following gem parameters by method.

UUID

The UUID method assigns a universally unique identifier (UUID) to each row. These values are randomly generated and are ideal when you need non-sequential, non-predictable IDs.

ParameterDescription
Output Column NameName of the new column where the generated UUIDs will be stored.
Column positionChoose to add as first column or add as last column in the output dataset.

Incremental ID

The Incremental ID method generates sequential values starting from a specified number and increasing by 1 for each row. You can also group and sort the data to restart numbering within each group and control the order in which IDs are assigned.

ParameterDescription
Output Column NameName of the new column that will contain the generated record IDs.
Starting ValueThe first number in the sequence.
Data typeSelect integer or string as the output type of the record ID values.
Size (String only)Total number of characters in the string.
Leading zeros will be added if the starting value is shorter than the defined size.
Column positionChoose to add as first column or add as last column in the output dataset.
Record ID Generation ScopeSpecify whether to generate IDs across entire table or within each group defined by selected columns.
Group By ColumnsWhen generating IDs within groups, choose one or more columns to group the data by.
Order Rows Within Each Group (Optional)(Optional) Define the columns to determine the order in which rows are numbered within each group. You can select multiple columns and sort them in ascending or descending order.