Skip to main content

Amazon EMR

Prophecy supports using Amazon EMR via Livy as your Spark execution engine for running pipelines.

These page includes steps to configure both Amazon EMR and Amazon EMR Serverless fabrics.

Create Amazon EMR cluster with Apache Livy

In your Amazon EMR service, create a cluster. When doing so:

  1. Under Application bundle select Custom.

  2. When selecting applications, make sure Livy and Spark are included in the install.

EMR create cluster

Configure network settings

To make sure that EMR can communicate with Prophecy, you need to configure specific network settings. Specifically, you need to modify the security groups of your EMR cluster.

  1. Modify the Primary Node security group to allow incoming connections to port 8998 from the Prophecy IP. You can do so by adding an inbound rule to the Master security group that permits incoming traffic on port 8998 from the Prophecy IP address.

  2. Modify the Core Node security group to allow outgoing connections to the Prophecy public IP 3.133.35.237 over HTTPS. Do this by adding an outbound rule to the Core security group that allows outgoing traffic over HTTPS protocol to the Prophecy public IP.

Create a fabric

To connect EMR and Prophecy, you must create a fabric. You can either create an EMR fabric (suggested), or a Livy fabric.

To create an EMR fabric:

  1. Open Prophecy and click Create Entity from the left navigation menu. Then, click on the fabric tile.

  2. Name your fabric and click Continue.

  3. Keep the Provider Type as Spark, and choose EMR as the Provider.

  4. Choose an authentication method (Static or SAML).

    • Static: Enter your AWS credentials under Access Key and Secret Key. Then, enter the region that your EMR cluster runs in.

    • SAML: Use an Okta application to authenticate the EMR connection. Learn more in Configure EMR SAML authentication with Okta.

  5. Click on Fetch environments.

  6. Under Spark Environment, select the EMR cluster that you would like to connect to.

  7. Enter the S3 path that points to the location where you would like your logs to persist.

EMR dependencies
  1. Add the job size to your environment by clicking on Add job Size. Configure your job size and click on Add.

  2. Select File System under Scala Resolution mode and input s3://prophecy-public-bucket/prophecy-libs

  3. Select File System under Python Resolution mode and input s3://prophecy-public-bucket/python-prophecy-libs

  4. Click Complete to save your new EMR fabric.