Visual Components

Component Toolbar

Component toolbar has in-built components provided by Prophecy.

Component in Workflow

Components have input ports and output ports - each has a name - though mostly they’re called in and out. You can connect output port of one component to the input port of the next component using an edge. An edge represents the flow of a Spark DataFrame. An output port can have many edges, to connect this output to many inputs. An input can have only one incoming edge.

The component has a Label on top that is prominent and in color. This can be edited by double-clicking on the component and renaming in the title of the dialog box. The label is also the name of the function in code to represent that component and should be named thoughtfully. A component also has a type that is shown below the component in gray.

Mouse over on a component and you’ll see a . . . on the top right and a play button on the bottom right.

Clicking the . . . menu opens up a few options.

Option Description
Rename You can change the name (label) of the component (though it is easier to open it and rename).
Change phase Phase is the number that appears in the bottom left of the component icon and is 0 here. Components with lower phase will execute before components with higher phase. This adds ordering between various components when there is no data flowing between them
Delete Delete this component
Detailed Stats This will compute more detailed stats for the outgoing data from this component. This can help figure out partition keys, or see most common values.
Cache This will cache the DataFrame flowing out of this component, repeated runs of following components will not execute the steps before this component, but will use the cached copy instead

Inside Component

The layout of the components is often similar. Here is an image followed by explanation

Section Description
Left Panel The left panel has inputs to the component. Sometimes there is a button to add more input ports for components with variable number of ports such as MultiJoin or SetOperation (UnionAll)
Port Name This is the name of the input port
Previous Component This is the name of the previous component whose output in connected to the input of this port.
Input Columns & Data Types For the input ports, the names of input columns and their data types are shown
Selected Column The user can click to select columns, they will often show up on the right panel when clicked
Right Panel This is the business area and covers most of the dialog. Here, it has Target Columns and Expressions
Language This is the language in which you want to see/edit the expressions. This is independent of the language in which the code is stored on Git. Most people prefer to use SQL here.
Expression Builder This is to assist the user to write expressions quickly suggesting in-built functions (and UDFs), column names and operators. The builder is always below the text being types, so can be ignored if it is not help you. Also, pressing escape can make it disappear - though adding more text will bring it back. This works with all languages
Data Drawer Clicking this will pull up a drawer from the bottom that will cover half the dialog. Here, one can see the input and out data for this component without closing the component and going outside.
Unit Test Drawer Clicking this will pull up a drawer from the bottom that will cover half the dialog. Here, one can see the unit tests for this component, edit them and run them - just for the current component