Machine Learning
Each page below describes one of the set of Gems that prepare or use data for Machine Learning. These Gems have a required Cluster library dependency. For an example set of Pipelines that use these Gems to create a Generative AI Chatbot, see this guide.
📄️ Text Processing
Text processing to prepare data to submit to a foundational model API.
📄️ OpenAI
Request OpenAI to generate a vector embedding or request OpenAI to answer a question with an optional context.
📄️ Pinecone Lookup
Lookup a vector embedding from a Pinecone Database
Cluster library dependencies
Spark-AI - Toolbox for building Generative AI applications on top of Apache Spark. This library dependency is required for the Machine Learning Gem functionality. Setup this dependency in one of two ways:
Option a. Import a project with the spark-ai dependency preconfigured, as in this guide.
Option b. Alternatively, add
prophecy-spark-ai==0.1.8
as a Python library, andio.prophecy:spark-ai_2.12:0.1.8
as a Maven library to the Pipeline dependencies.
Click here to see the roadmap for what's coming in the future with the Spark-AI toolbox.