Skip to main content

Deduplicate

SQL

Removes rows with duplicate values of specified columns.

Mode

Next to Deduplicate On Columns, choose how to keep certain rows.

ModeDescriptionAdditional parametersOutput
Distinct Rows (Default)Keeps all distinct rows.NoneAll columns are passed through unless columns are specified. Specified columns are persisted in the output.
Unique OnlyKeeps rows that do not have duplicates.
  • Expression: Column that determines uniqueness
All columns are passed through.
FirstKeeps the first occurrence of the duplicate row.
  • Expression: Column that determines uniqueness
  • Use Custom Order By: Sort the rows
All columns are passed through.
LastKeeps the last occurrence of the duplicate row.
  • Expression: Column that determines uniqueness
  • Use Custom Order By: Sort the rows
All columns are passed through.

Example

Suppose you're deduplicating the following table.

First_NameLast_NameTypeContact
JohnDoephone123-456-7890
JohnDoephone123-456-7890
JohnDoephone123-456-7890
AliceJohnsonphone246-135-0987
AliceJohnsonphone246-135-0987
AliceJohnsonemailalice@johnson.com
AliceJohnsonemailalice@johnson.com
BobSmithemailbob@smith.com

Distinct Rows

If you use Distinct Rows, the output would be:

First_NameLast_NameTypeContact
JohnDoephone123-456-7890
AliceJohnsonphone246-135-0987
AliceJohnsonemailalice@johnson.com
BobSmithemailbob@smith.com

If you want to remove the Alice Johnson duplicates, you can specify a subset of columns to deduplicate. In this case, you want to determine duplication based on First_Name and Last_Name columns. However, this will remove additional columns. Use First or Last to preserve the other columns in the output.

First_NameLast_Name
JohnDoe
AliceJohnson
BobSmith

Unique Only

For Unique Only, the output would be:

First_NameLast_NameTypeContact
BobSmithemailbob@smith.com

This outputs one unique row because the rest were duplicates.

First and Last

The First and Last options work similarly to Distinct Rows, but they keep the first and last occurrence of the duplicate rows respectively.