Skip to main content

Amazon EMR

In the context of Spark execution engines, users have the flexibility to opt for Amazon EMR. This guide offers step-by-step instructions on creating a Fabric that enables seamless connectivity to the EMR environment.

Create Amazon EMR cluster with Apache Livy

Navigate to Amazon EMR service and create a cluster. Under Application bundle select Custom.

Choose appropriate applications to include in your installation. At a minimum, please make sure sure Livy and Spark are included in the install.

EMR create cluster

Allow network connectivity between Amazon EMR and Prophecy

To configure the necessary network settings for seamless integration, specific modifications to the security groups of your EMR cluster are required. Please follow the instructions outlined below: If you intend to utilize Prophecy as a SaaS (Software as a Service) solution, please note that the Prophecy public IP is 3.133.35.237. Ensure that the Core security group's outbound rule allows connections to this IP address.

  1. Modify the Primary Node security group:
    • Allow incoming connections to port 8998 from the Prophecy IP.
    • This can be achieved by adding an inbound rule to the Master security group that permits incoming traffic on port 8998 from the Prophecy IP address.
  2. Modify the Core Node security group:
    • Allow outgoing connections to the Prophecy public IP over HTTPS.
    • To enable this, add an outbound rule to the Core security group that allows outgoing traffic over HTTPS protocol to the Prophecy public IP.

By implementing these adjustments to your EMR cluster's security groups, you will establish the necessary network configuration for effective integration with Prophecy.

Create a Fabric to connect Prophecy to EMR

Navigate to Prophecy's UI and click on Create Fabric. The Fabric will establish a connection with your EMR cluster and utilizes it as the execution engine for your Pipelines.

EMR create cluster

Name your EMR Fabric and click on Continue.



EMR create Fabric

Choose EMR as your Provider.

EMR create Fabric

Before proceeding, it is crucial to ensure that all the required settings are properly configured. If you are uncertain about the necessary configurations, we recommend reaching out to your cloud infrastructure team for additional guidance and assistance. They will be able to provide you with the specific information and support needed to ensure a successful setup.

Enter your AWS credentials under Access Key and Secret Key. Choose the Region that your EMR cluster is running in.

Click on Fetch environments.

EMR cred

Select your EMR cluster that you would like to connect to under Spark Environment.

EMR select

Most of the fields should be automatically populated after selecting a EMR cluster. Enter the S3 path that points to the location where you would like your logs to persist.

EMR dependencies

Add the Job size to your environment by clicking on Add Job Size. Configure your Job size and click on Add.

EMR Job size

Configure the Prophecy Library:

Select File System under Scala Resolution mode and provide the path:

s3://prophecy-public-bucket/prophecy-libs

Select File System under Python Resolution mode and provide the path:

s3://prophecy-public-bucket/python-prophecy-libs

EMR Job size

Click on Complete and your EMR Fabric is ready!

EMR create cluster

Run a simple Pipeline and make sure that the interim returns data properly.