Skip to main content

Write Options

When writing data to a table, there are multiple ways to determine how the data will be written. When you configure a target table or target model, the Write Options tab lets you determine how you will store your processed data and handle changes to the data over time. The choice of write mode depends on your specific use case.

Write modes

Wipe and Replace Table (default)

Replaces all existing data with new data on each run. This is the simplest approach and ensures your target table always reflects the latest state of your source data.

info

The incoming table must have the same schema as the existing table.

Example: Replace vehicle registry

A full vehicle registry is refreshed each day, replacing all prior records with the latest list.

Incoming table

VEHICLE_IDTYPEREGISTERED_AT
201Bus2024-01-15
202Train2024-01-16

Existing table

VEHICLE_IDTYPEREGISTERED_AT
101Bus2023-12-01
102Tram2023-12-02

Updated table

VEHICLE_IDTYPEREGISTERED_AT
201Bus2024-01-15
202Train2024-01-16

Notice that all previous records (vehicles 101 and 102) have been completely removed and replaced with the new incoming data.

Partition the target table (BigQuery only)

The following partitioning parameters are available for the Wipe and Replace Table write mode on BigQuery.

ParameterDescription
Column NameThe name of the column used for partitioning the target table.
Data TypeThe data type of the partition column.
Supported types: timestamp, date, datetime, and int64.
Partition By granularityApplicable only to timestamp, date, or datetime data type.
Defines the time-based partition granularity: hour, day, month, or year.
Partition RangeApplicable only to int64 data type.
Specify a numeric range for partitioning using a start, end, and interval value (e.g., start=0, end=1000, interval=10).
You must define an interval value so that Prophecy knows at what intervals to create the partitions.
info

Only BigQuery tables can be partitioned at the table level. To learn more about partitioning, jump to Partitioning.


Append Row

Adds new rows to the existing table without modifying existing data. No deduplication is performed, so you may end up with duplicate records.

info

This strategy is best used when unique keys aren't required. For key-based updates, use one of the Merge options instead.

Example: Add daily trips

Daily trips are appended so that historical trips remain intact.

Incoming table

TRIP_IDVEHICLE_IDDATE
3012012024-01-15
3022022024-01-15

Existing table

TRIP_IDVEHICLE_IDDATE
1011012024-01-14
1021022024-01-14

Updated table

TRIP_IDVEHICLE_IDDATE
1011012024-01-14
1021022024-01-14
3012012024-01-15
3022022024-01-15

Notice that the new trips (301 and 302) are added to the existing table without modifying the original records (101 and 102).


Merge - Upsert Row

If a row with the same key exists, it is updated. Otherwise, a new row is inserted. You can also limit updates to specific columns, so only selected values are changed in matching rows.

ParameterDescription
Unique KeyColumn(s) used to match existing records in the target dataset. In many cases, the unique key will be equivalent to your table's primary key, if applicable.
Use PredicateLets you add conditions that specify when to apply the merge.
Use a condition to filter data or incremental runsEnables applying conditions for filtering the incoming data into the table.
Merge ColumnsSpecifies which columns to update during the merge. If empty, the merge includes all columns.
Exclude ColumnsDefines columns that should be excluded from the merge operation.
(Advanced) On Schema ChangeSpecifies how schema changes should be handled during the merge process.
  • ignore: Newly added columns will not be written to the model. This is the default option.
  • fail: Triggers an error message when the source and target schemas diverge.
  • append_new_columns: Append new columns to the existing table.
  • sync_all_columns: Adds any new columns to the existing table, and removes any columns that are now missing. Includes data type changes. This option uses the output of the previous gem.

Example: Update vehicle type only

When an incoming VEHICLE_ID matches an existing one, vehicle type information is updated while registration dates remain unchanged. The merge column is explicitly set to TYPE.

Incoming table

VEHICLE_IDTYPE
101Tram
102Bus
104Bus

Existing table

VEHICLE_IDTYPEREGISTERED_AT
101Bus2023-12-01
102Train2023-12-02
103Bus2023-12-03

Updated table

VEHICLE_IDTYPEREGISTERED_AT
101Tram2023-12-01
102Bus2023-12-02
103Bus2023-12-03
104Busnull

Notice that only the TYPE column was updated for vehicles 101 and 102 while the REGISTERED_AT values remained unchanged. Additionally, note that vehicle 103 remained unchanged since it didn't match a vehicle in the incoming data. Finally, vehicle 104 was inserted, and no data was added to the REGISTERED_AT column.


Merge - Wipe and Replace Partitions

Replaces entire partitions in the target table. Only partitions containing updated data will be overwritten; other partitions will not be modified.

ParameterDescription
Partition byDefines the partitions of the target table.
  • Databricks: Each unique value of the partition column corresponds to a partition. You cannot change the granularity of the partitions.
  • BigQuery: You must manually define the granularity of your partitions. BigQuery does not automatically infer how to write the partitions.
(Advanced) On Schema ChangeSpecifies how schema changes should be handled during the merge process.
  • ignore: Newly added columns will not be written to the model. This is the default option.
  • fail: Triggers an error message when the source and target schemas diverge.
  • append_new_columns: Append new columns to the existing table.
  • sync_all_columns: Adds any new columns to the existing table, and removes any columns that are now missing. Includes data type changes. This option uses the output of the previous gem.

Example: Update trips for a specific day

In this case, the partition column is the DATE column, where each partition corresponds to a single day. Since the incoming data includes records from 2024-01-15, all of the records from that date are dropped from the existing table and replaced with the new data.

Incoming table

TRIP_IDVEHICLE_IDDATE
4012012024-01-15
4022022024-01-15

Existing table

TRIP_IDVEHICLE_IDDATE
1011012024-01-14
3012012024-01-15
3022022024-01-15

Updated table

TRIP_IDVEHICLE_IDDATE
1011012024-01-14
4012012024-01-15
4022022024-01-15

Notice that only the 2024-01-15 partition was replaced with new data, while the 2024-01-14 partition remained untouched.

Define partition granularity (BigQuery only)

The following partitioning parameters allow you to define the partition granularity for this operation.

ParameterDescription
Column NameThe name of the column used for partitioning the target table.
Data TypeThe data type of the partition column.
Supported types: timestamp, date, datetime, and int64.
Partition By granularityApplicable only to timestamp, date, or datetime data type.
Defines the time-based partition granularity: hour, day, month, or year.
Partition RangeApplicable only to int64 data type.
Specify a numeric range for partitioning using a start, end, and interval value (e.g., start=0, end=1000, interval=10).
You must define an interval value so that Prophecy knows at what intervals to create the partitions.

Merge - SCD2

Tracks historical changes by adding new rows instead of updating existing ones. This lets you preserve a complete history, since every change generates a new entry rather than erasing old information. To be specific:

  • New rows are added for incoming records with unique keys that don't exist in the target table.
  • For records matching existing unique keys, a new row is added only when the incoming data differs from the existing record.
  • New rows are assigned a start date and a null end date (indicating it's currently valid).
  • If the unique key of the new record matched an existing record, the existing row is assigned an end date to mark when it stopped being valid.
  • This creates a complete timeline showing how data evolved over time.
ParameterDescription
Unique KeyColumn(s) used to match existing records in the target dataset. In many cases, the unique key will be equivalent to your table's primary key, if applicable.
Invalidate deleted rowsWhen enabled, records that match deleted rows will be marked as no longer valid.
Determine new records by checking timestamp columnRecognizes new records by the time from the Updated at column that you define.
Determine new records by looking for differences in column valuesRecognizes new records based on a change of values in one or more specified columns.

Example: Route assignment history

Each time a vehicle changes routes, a new row is added to preserve history. In this case, Determine new records by checking timestamp column is enabled, and ASSIGNED_AT is the timestamp column.

Incoming table

VEHICLE_IDROUTEASSIGNED_AT
101Route B2024-01-15

Existing table

VEHICLE_IDROUTEASSIGNED_ATvalid_fromvalid_to
101Route A2024-01-012024-01-01NULL

Updated table

VEHICLE_IDROUTEASSIGNED_ATvalid_fromvalid_to
101Route A2024-01-012024-01-012024-01-15
101Route B2024-01-152024-01-15NULL

Notice that the original Route A record now has an end date (2024-01-15 taken from the ASSIGNED_AT column) and a new Route B record was added with a null end date, preserving the complete history of route assignments.


Merge - Update Row

Update records that match conditions defined in the Use Predicate parameter. Instead of updating individual rows, it replaces the entire partition of the table that matches the given predicate. The predicate defines which rows in the target table should be fully replaced with the incoming data.

ParameterDescription
Use PredicateLets you add conditions that specify when to apply the merge.
Use a condition to filter data or incremental runsEnables applying conditions for filtering the incoming data into the table.
(Advanced) On Schema ChangeSpecifies how schema changes should be handled during the merge process.
  • ignore: Newly added columns will not be written to the model. This is the default option.
  • fail: Triggers an error message when the source and target schemas diverge.
  • append_new_columns: Append new columns to the existing table.
  • sync_all_columns: Adds any new columns to the existing table, and removes any columns that are now missing. Includes data type changes. This option uses the output of the previous gem.

Example: Update trips from January 5th onwards

When any record from January 5th onwards needs updating, ALL records from that date forward are dropped and replaced with the incoming data. Records from before January 5th remain unchanged.

Predicate: DATE >= '2024-01-05'

Incoming table

TRIP_IDSTATUSDATE
201Closed2024-01-05
203Closed2024-01-10

Existing table

TRIP_IDVEHICLE_IDDATESTATUS
2011012024-01-05Open
2021022023-12-20Open
2031032024-01-10Open
2041042024-01-15Open

Updated table

TRIP_IDVEHICLE_IDDATESTATUS
2011012024-01-05Closed
2031032024-01-10Closed
2021022023-12-20Open

Notice that ALL trips from January 5th onwards (201, 203, and 204) were dropped and replaced with only the incoming records (201 and 203), while trip 202 (from December 20th) remained unchanged.


Merge - Delete and Insert

For rows in the target table that have keys that match rows in the incoming dataset, deletes matching rows from the existing table and replaces them. For incoming rows with new keys, inserts these normally. This ensures that updated records are fully replaced instead of partially updated.

note

This strategy helps when your unique key is not truly unique (multiple rows per key need to be fully refreshed).

ParameterDescription
Unique KeyColumn(s) used to match existing records in the target dataset.
Use PredicateLets you add conditions that specify when to apply the merge.
Use a condition to filter data or incremental runsEnables applying conditions for filtering the incoming data into the table.
(Advanced) On Schema ChangeSpecifies how schema changes should be handled during the merge process.
  • ignore: Newly added columns will not be written to the model. This is the default option.
  • fail: Triggers an error message when the source and target schemas diverge.
  • append_new_columns: Append new columns to the existing table.
  • sync_all_columns: Adds any new columns to the existing table, and removes any columns that are now missing. Includes data type changes. This option uses the output of the previous gem.

Example: Refresh vehicle registry entries

If a vehicle already exists in the fleet, its old record is deleted and replaced with the incoming row. New vehicles are simply inserted.

Incoming table

VEHICLE_IDTYPEREGISTERED_AT
201Train2024-01-15
203Bus2024-01-16

Existing table

VEHICLE_IDTYPEREGISTERED_AT
201Train2023-12-01
202Tram2023-12-02

Updated table

VEHICLE_IDTYPEREGISTERED_AT
201Train2024-01-15
202Tram2023-12-02
203Bus2024-01-16

Notice that vehicle 201's old record was completely replaced with the new data, vehicle 202 remained unchanged, and vehicle 203 was added as a new entry.


How write modes work

Prophecy simplifies data transformation by providing intuitive write mode options that abstract away the complexity of underlying SQL operations. Behind the scenes, Prophecy generates dbt models that implement these write strategies using SQL warehouse-specific commands.

Write modeConcept in dbt
Wipe and Replace TablesMaps to dbt's materialized: 'table' strategy, which creates a new table on each run.
Append RowsMaps to dbt's materialized: 'table' with append-only logic, adding new rows without modifying existing data.
Merge optionsMaps to dbt's materialized: 'incremental' strategy, which updates tables by only transforming and loading new or changed data since the last run.

When you select a write mode in a Table gem, Prophecy automatically generates the appropriate dbt configuration and SQL logic. This means you can focus on your data transformation logic rather than learning dbt's materialization strategies or writing complex SQL merge statements.

note

To understand exactly what happens when Prophecy runs these write operations, switch to the Code view of your project and inspect the generated dbt model files. These files contain the SQL statements and dbt configuration (like materialized: 'incremental') that dbt uses to execute the write operation. To learn more about the specific configuration options available for each SQL warehouse, visit the dbt documentation links below.

Partitioning

Depending on the SQL warehouse you use to write tables, partitioning can have different behavior. Let's examine the differences between partitioning in Google BigQuery and Databricks.

  1. BigQuery: Partitioning is a table property.

    In BigQuery, partitioning is defined at the table schema level (time, integer range, or column value). Because it is a part of the table architecture, the physical storage in BigQuery is optimized by partitioning automatically.

    Once a table is partitioned, every write to that table, full or incremental, respects the partitioning. That means even when you drop and create an entirely new table, BigQuery creates the table with partitions in an optimized way.

  2. Databricks: Partitioning is a write strategy.

    Databricks organizes data in folders by column values. Partitioning only makes sense when you’re using the Wipe and Replace Partitions write mode because it allows you to overwrite specific directories (partitions) without rewriting the whole table.

    For the Wipe and Replace Table option, the table is dropped and completely recreated. Partitioning doesn’t add any runtime benefit here, so this is not an option for Databricks.

Write modes matrix

The following table describes the write modes that Prophecy supports by SQL warehouse and gem type.

Write modeDatabricks tableDatabricks modelBigQuery tableBigQuery modelSnowflake model
Wipe and Replace Table
Append Row
Merge - Upsert Row
Merge - Wipe and Replace Partitions
Merge - SCD2
Merge - Update Row
Merge - Delete and Insert

Troubleshooting

Schema mismatch errors

Problem: Incoming and existing schemas don't align.

Solution: Use the "On Schema Change" setting to set behavior or ensure schema compatibility.

Duplicate data

Problem: Unexpected duplicate records appear in the written table.

Solution: Consider using merge modes instead of append.