Intersect gem
Use the Intersect Gem to return only the rows that appear in all input tables. This is useful for identifying overlapping data, such as customers who are active on multiple platforms, or transactions that appear across different systems or logs.
The Intersect gem has a corresponding interactive gem example. See Interactive gem examples to learn how to run sample pipelines for this and other gems.
Input and Output
| Port | Description |
|---|---|
| in0 | The first input table to compare. |
| in1 | The second input table to compare. |
| inN | Optional: Additional tables to compare. |
| out | A single table containing only rows that appear in all input tables. |
To add additional input ports, click + next to Ports.
All input tables must have identical schemas (matching column names and data types).
Parameters
| Parameter | Description |
|---|---|
| Operation Type | Shows that the set operation type is Intersect |
| Preserve duplicate rows | Checkbox to keep duplicates in the output table |
Example
Let’s say you’re working with two tables: Table A and Table B.
- Both tables contain order-related data.
- Table A contains order information from customer
1,2, and3. - Table B contains order information from customer
1,2,3, and4. - Tables A and B each contain some identical records (duplicates).
Table A
| order_id | customer_id | order_date | amount |
|---|---|---|---|
| 101 | 1 | 2024-12-01 | 250.00 |
| 102 | 2 | 2024-12-03 | 150.00 |
| 101 | 1 | 2024-12-01 | 250.00 |
| 104 | 3 | 2025-02-10 | 200.00 |
Table B
| order_id | customer_id | order_date | amount |
|---|---|---|---|
| 101 | 1 | 2024-12-01 | 250.00 |
| 104 | 3 | 2025-02-10 | 200.00 |
| 105 | 4 | 2025-03-05 | 400.00 |
| 106 | 2 | 2025-03-07 | 180.00 |
| 101 | 1 | 2024-12-01 | 250.00 |
Result
Default
The table that results from the Intersect gem only includes the records present in all the input tables.
| order_id | customer_id | order_date | amount |
|---|---|---|---|
| 101 | 1 | 2024-12-01 | 250.00 |
| 104 | 3 | 2025-02-10 | 200.00 |
The output indicates that order 101 and 104 appear in both Table A and Table B.
By default, the duplicate row for order 101 is removed from the output.
Preserve duplicates
If the preserve duplicates option is selected, the duplicates present in all inputs are preserved.
| order_id | customer_id | order_date | amount |
|---|---|---|---|
| 101 | 1 | 2024-12-01 | 250.00 |
| 101 | 1 | 2024-12-01 | 250.00 |
| 104 | 3 | 2025-02-10 | 200.00 |
The output is the same as above, except the duplicate order 101 is retained.