Amazon EMR
In the context of Spark execution engines, you have the flexibility to opt for Amazon EMR. This guide offers step-by-step instructions on creating a Fabric that enables seamless connectivity to the EMR environment.
Create Amazon EMR cluster with Apache Livy
Navigate to Amazon EMR service and create a cluster. Under Application bundle select Custom.
Choose appropriate applications to include in your installation. At a minimum, make sure sure Livy and Spark are included in the install.
Allow network connectivity between Amazon EMR and Prophecy
To configure the necessary network settings for seamless integration, specific modifications to the security groups of your EMR cluster are required. Follow the instructions outlined below:
If you intend to utilize Prophecy as a SaaS (Software as a Service) solution, note that the Prophecy public IP is 3.133.35.237
. Ensure that the Core security group's outbound rule allows connections to this IP address.
- Modify the Primary Node security group:
- Allow incoming connections to port
8998
from the Prophecy IP. - This can be achieved by adding an inbound rule to the Master security group that permits incoming traffic on port
8998
from the Prophecy IP address.
- Allow incoming connections to port
- Modify the Core Node security group:
- Allow outgoing connections to the Prophecy public IP over HTTPS.
- To enable this, add an outbound rule to the Core security group that allows outgoing traffic over HTTPS protocol to the Prophecy public IP.
By implementing these adjustments to your EMR cluster's security groups, you will establish the necessary network configuration for effective integration with Prophecy.
Create a Fabric to connect Prophecy to EMR
Navigate to Prophecy's UI and click on Create Fabric. The Fabric will establish a connection with your EMR cluster and use it as the execution engine for your Pipelines.
Name your EMR Fabric and click on Continue.
Choose EMR as your Provider.
Before proceeding, it is crucial to ensure that all the required settings are properly configured. If you are uncertain about the necessary configurations, we recommend reaching out to your cloud infrastructure team for additional guidance and assistance. They will be able to provide you with the specific information and support needed to ensure a successful setup.
Enter your AWS credentials under Access Key and Secret Key. Choose the Region that your EMR cluster is running in.
Click on Fetch environments.
Select your EMR cluster that you would like to connect to under Spark Environment.
Most of the fields should be automatically populated after selecting a EMR cluster. Enter the S3 path that points to the location where you would like your logs to persist.
Add the Job size to your environment by clicking on Add Job Size. Configure your Job size and click on Add.
Configure the Prophecy Library:
Select File System under Scala Resolution mode and provide the following path:
s3://prophecy-public-bucket/prophecy-libs
Select File System under Python Resolution mode and provide the following path:
s3://prophecy-public-bucket/python-prophecy-libs
Click on Complete and your EMR Fabric is ready!
Run a simple Pipeline and make sure that the interim returns data properly.