Unit tests for Spark
Writing good unit tests is one of the key stages of the CI/CD process. It ensures that the changes made by developers to projects will be verified and all the functionality will work correctly after deployment.
Prophecy makes the process of writing unit cases easier by giving an interactive environment via which unit test cases can be configured across each component.
There are two types of unit test cases which can be configured through Prophecy UI:
- Output rows equality
- Output predicates
Let us understand both types in detail:
Output rows equality
Automatically takes a snapshot of the data for the component and allows to continuously test that the logic performs as intended. This would simply check the equality of the output rows.
Example
In the below example we would create below unit tests:
- To check the join condition correctly for one-to-one mappings.
- To check the join condition correctly for one-to-many mappings.
Output predicates
These are more advanced unit tests where multiple rules need to pass in order for the test as a whole to pass. Requires Spark expression to be used as predicates.
Example
In the below example we will create below unit tests:
- Check that the value of amount column is
>0
. - Check whether first name is not equal to last name.
Generating sample data for test cases automatically
To generate sample input data automatically from the source DataFrame, this option can be enabled while creating unit test.
Pipeline needs to run once, to generate units test based on auto-generated sample data.
Let's generate sample data automatically for the unit test case we created in above example.
Generated code
Behind the scenes, the code for unit tests is automatically generated in our repository. Let's have a look at the generated code for our unit test above.