Skip to main content

Data tests for SQL

A data test is an assertion you define about a dataset in your project. Use data tests to verify that your business-critical data is consistently generated and meets expectations over time. Instead of manually reviewing datasets after each pipeline run or model execution, you can define tests that automatically validate your data and ensure data quality.

When you create data tests in the visual canvas, Prophecy runs dbt for the underlying test execution and simplifies the test definitions that are normally defined in .sql and .yaml files.

Prerequisites

To use data tests in your SQL project, you will need:

  • A Prophecy project where SQL is the primary language.
  • A Prophecy fabric or SQL fabric.
info

Prophecy does not support data tests for fabrics configured with BigQuery and a CMEK.

Test types

There are two different data test types supported in Prophecy:

  • Project tests: Singular-use tests that apply to one specific table.
  • Generic tests (Test definitions): Generic tests that can be reused repeatedly for table and column tests.
General situationsProject testModel testColumn test
Test a single tableTickCrossCross
Test multiple tablesCrossTickTick

When to use each test type

See a few recommendations in the following tables to get an idea of when to use each test type.

Specific situationsProject testModel testColumn test
Test for referential integrityTickCrossCross
Test for late arriving dataTickCrossCross
Test for data consistency verificationTickCrossCross
Test for model size and aggregationsCrossTickCross
Test for column data format and data presence (nulls, empty strings, etc.)CrossCrossTick

Example: Referential integrity check

The following test named ref_int_orders_customers checks for referential integrity. In particular, it checks if every customer_id entry in the orders table is present in the customers table.

Project test canvas

This test starts with several models from the HelloWorld_SQL project, combines their data with a series of transformation steps, and feeds the resulting table into the Data Test gem.

If there are customer_id entries in the orders table that are not present in the customers table, then the ref_int_orders_customers test fails.

What's next

To set up a project test, see Use project tests.

If you need to reuse a test that is defined by a parametrized query, see Use model tests.