Prophecy introduces Package Hub, which enables data practitioners to create and share Pipeline components.
A Package is a versioned Project that can be shared across teams. As such, a Package can contain Pipeline templates, custom Gems, functions, subgraph templates, etc - a reusable version of everything a Project contains. Package dependencies allow us to re-use components so we don’t have to rebuild them. The coding community has been using packages for ages, and finally the low-code community can take advantage of the same idea. Packages are shareable within and across teams. For extra visibility, the Package Hub is a curated selection of Packages that your teams create and publish for other users to leverage.
Just include a Package as a dependency to take advantage of its contents. See the sections below for step-by-step instructions on how to use, build, and share Packages. For those looking for a deeper dive on building packages, see the Package Builder page.
Use a package
Open the helloworld project or create a new project. Notice the dependencies include
On opening a NewPipeline, notice all the transformation dropdown Gems are from
SparkBasics package. Drag and drop a Gem, eg Deduplicate, from SparkBasics package to use in a NewProject Pipeline.
Already you are employing an engineering best practice: package dependencies! When we say that a project USES a package, another way to say it is “a Project has a dependency on the Package.”
NewProject has a dependency on SparkBasics Package. But don’t stop there. You’ll want to use packages shared by your teammates!
Adding a Package to your Project is easy. From a Pipeline, just (1)Add a new Package. Now the Package and all of the contents - Pipeline templates, custom Gems, jobs, UDFs, etc - will be accessible in your Pipeline and the entire Project. Want to see the list of Packages available for your Project? From the (2)Project page, click (3)Dependencies to see the Packages listed as Dependencies for your Project. (4)Add or remove dependencies using the dropdown.
Another great way to find Packages is via the (1)Package Hub. This Hub serves to give visibility to the best, curated Packages that Teams want to highlight. Therefore, only Published Packages appear in the PackageHub. Search for a Package by search string, or filter by content, type, or team. (2)Select a package card and (3)Use the Package as a dependency to a new or existing project.
A Package can contain Pipeline templates, custom Gems, Subgraph templates, Functions, and more. In fact, a Package can contain anything a Project contains. Why? A Package is a Project that has been Released with a version.
A Package is a set of logic to apply in various situations. Any changes to this logic would be done via versioned package release. For this reason, the projects that USE the package will not be able to change the package.
Let’s diagram a slightly more complex scenario that represents how your teams might use Packages. (So far we’ve considered
SparkBasics, a Project created by the engineering team at Prophecy which only contains the Gems we already know and love.) We need a slightly more complex scenario to explore re-using Pipeline templates and custom Gems. (Detailed pages exploring re-usable subgraphs, UDFs, and jobs will be available in the near future.)
Consider now the
Framework Project and two Projects that have dependencies on the
MarketingLeads Project lists package
Framework as a dependency, any Pipelines from
Framework will be accessible in
MarketingLeads by passing configurable variables. The
MarketingLeads Project cannot change the Framework Pipelines. This is a good thing - the Platform team decides how
Framework will evolve.
The MarketingAnalytics team can create a (1)new Pipeline configuration and (2)name it
LeadsCleanup. Then (3)select that configuration, perhaps change the variable “table” value from default to a more interesting (4)value for their use case.
LeadsCleanup is a configuration instance of the
GenericCleanup Pipeline template.
Now, in the same
CustomersModeling Project, (1)select the desired configuration instance, and (2)run the Pipeline with that configuration. Here the
GenericCleanup Pipeline is a readonly template, so the MarketingAnalytics team gets to take advantage of the logic while passing configurations relevant for their own use cases.
If a change is needed in the
Framework Pipelines, the changes must be made in the
Framework Project. When the
Framework Project is released with the changes,
MarketingLeads Project can (3)update the dependency. With
Framework v0.0.2, the
GenericCleanup Pipeline has been updated with (4)additional Gems which can now be used with the
Re-using custom Gems
Configuring Pipelines is just one way to take advantage of the shared, reusable components in a Package. Another (very powerful) way is to use custom Gems.
If the MarketingAnalytics team wishes to use one of the custom Gems from
Framework in a new Pipeline, just (1)drag-and-drop the Gem, eg DataMasking , to the Pipeline canvas. (2)Visually configure the Gem and execute the Pipeline as you would for any Transformation Gem.
Likewise, new Dataset Format Gems from
Framework can be used in a new Pipeline.
Select from the (1)source/target dropdown menu, and (2)select the new Dataset format card from the options. The Platform and MarketingAnalytics teams are using the industry best-practice for re-usable components.
Build a package
A package is simply a released project. The project can contain Pipeline templates, functions, subgraph templates, custom Gems, etc. The most important of these components are custom Gems, so we’ll focus on building a Package with a Custom Gem here.
Start by (1)creating a new Project called
SecurityPython. There are no changes needed; use the default settings to (2)Complete Project creation.
From the Project Browser, (1)Create a new Gem. Give the Gem a (2)name like
Encrypt. This will be a Transformation Gem; another time try creating a different Gem mode or category.
Next specify the Gem’s code.
When you create a new Gem, a (1)code guide appears. Use the guide or replace with your (2)custom code. Explore the Gem Structure to understand the code requirements. Then go to the (3)Visual View and make sure the UI appears with no errors. Now the Custom Gem is ready to try in the canvas.
(1)Click and drag the custom Gem Encrypt (in the list of Gems within the SecurityPython Project) to a new or existing Pipeline canvas. (2) Configure the custom Gem as with any other Gem: (3)define expressions, check the (4)output column list, even (5)infer the schema based on the attached Fabric. (6)Save the configured Gem to start using it.
(1)Run the custom Gem to check whether the functionality works as expected. Click (2)Data to view the data input and (3)output preview. The
customer_id is encrypted after our function is applied.
Click here for a deep dive on building Packages with Custom Gems.
Once you have tested your Gem in the canvas, and you are happy with both the Gem UI Component and Gem Code Logic, you will want to release a tagged version.
(1) Commit and (2)Release the Project, including the Custom Gem. Continue developing and releasing new versions as incremental improvements!
The release must be made from the branch specified on project creation, usually main or master.
This implementation ensures teams will review code before releasing.
Now we have built a package! Now other projects owned by the same team can use the package as a versioned dependency.
Share a package
The entire Project is (1)packaged into a jar or wheel file and uploaded to Prophecy’s artifactory. Once the Package is successfully released, it can be shared with specific teams or highlighted by publishing to the Package Hub.
Select the (2) Metadata tab, then select your (3)Project’s page. Then select (4)Access, select the (5)Teams that should have access to the package. (6)Send an invitation to those teams. For the team’s best packages that are intended for broad consumption, (7)Publish to the Package Hub.
If another team wants to use the
SecurityPython package in their newProject, just add
SecurityPython as a dependency for
newProject. Now the user can see and use the
Encrypt custom Gem.
We have built and shared our package with a custom
Administer using HubManager (coming post 3.2)
Admin users can manage which packages show up on the PackageHub, and which users can access
With Package Hub, Platform teams can build solutions once, and re-use the logic for additional cases. Errors can be handled just once. Updated Packages take advantage of logic updates and error-fixes. Data Analyst teams will build Pipelines faster by taking advantage of the building blocks - custom Gems, UDFs - provided by the Platform team. Or, Data Analyst teams might create their own building blocks! Everyone can create and share Pipeline components with Package Hub. We’d love to hear how your teams are creating, sharing, and even curating Packages. Schedule some time with us - we would LOVE to hear your feedback!
Q: I shared a dependency to a team, but we don’t see any email notification. A: The share “invitation” is not sent via email. Rather, the team would be able to add that project as a dependency to their new or existing projects.
Q: I shared a dependency to a team, but they don’t see the package listed when they try to add as a dependency. A: Be sure the new project and dependent project are using the same language, eg Scala or Python. If the new project is a Scala project, only Scala Packages can be added as dependencies.
Q: How do I use subgraphs from a dependency? A: Like Pipelines, Subgraphs can be used by dependent projects via configurable variables.
Q: Does Prophecy provide a DataQuality Package? A: Not yet, but coming soon!
Q: How would I take a Package built by another team and evolve it in a new direction? That is, how can two teams take a single project in two different directions?
A: One excellent option here is to Clone the Project instead of use as a Package Dependency. For example, let’s say the MarketingAnalytics team loved the
Framework Package and wanted it to become a holding place for all the Marketing Pipelines. The Platform team, however, wants to keep the
Framework Project as general as possible. How can the two teams each take a version of the project in their own direction?
Just (1)Clone the
Framework Project, and (2)provide a new name, team, Git account, and repository.
Now the MarketingAnalytics team has their own Project,
MarketingFramework, to evolve in their own direction. Notice the Pipelines are not templates, rather the
GenericCleanup Pipeline can be edited by adding a new Gem, eg Repartition: