Skip to main content

Unit tests for Spark

Writing good unit tests is one of the key stages of the CI/CD process. It ensures that the changes made by developers to projects will be verified and all the functionality will work correctly after deployment.

Prophecy makes the process of writing unit cases easier by giving an interactive environment via which unit test cases can be configured across each component.

There are two types of unit test cases which can be configured through Prophecy UI:

  1. Output rows equality
  2. Output predicates

Let us understand both types in detail:

Output rows equality

Automatically takes a snapshot of the data for the component and allows to continuously test that the logic performs as intended. This would simply check the equality of the output rows.

Example

In the below example we would create below unit tests:

  1. To check the join condition correctly for one-to-one mappings.
  2. To check the join condition correctly for one-to-many mappings.

Output predicates

These are more advanced unit tests where multiple rules need to pass in order for the test as a whole to pass. Requires Spark expression to be used as predicates.

Example

In the below example we will create below unit tests:

  1. Check that the value of amount column is >0.
  2. Check whether first name is not equal to last name.

Generating sample data for test cases automatically

To generate sample input data automatically from the source DataFrame, this option can be enabled while creating unit test.

note

Pipeline needs to run once, to generate units test based on auto-generated sample data.

Let's generate sample data automatically for the unit test case we created in above example.

Generated code

Behind the scenes, the code for unit tests is automatically generated in our repository. Let's have a look at the generated code for our unit test above.

Renaming the name of unit test