Transform Gems that generate code but do not work within Streaming Applications include
Window would work with a watermarked column (explained below) as part of the partitioning, it is advised to use
session_window() from the
pyspark.sql.functions package (link).
Watermarking is a technique that enables aggregations on streaming data by limiting the state over which the aggregation is performed. In order to prevent out-of-memory errors, we have introduced support for watermarking. More information on watermarking is available in the Spark documentation here
We have added a Watermarking Gem in the Transform Section that allows a user to add a Watermark to a DataFrame.
In this example, we add Watermarking to the
timestamp column. A user may enter the column name or select one from the Schema Table on the left. The text box is editable. Finally, define the Watermark Duration.
It is recommended to use Watermarking on a Streaming DataFrame in case you're planning to use any of the following operations on it: