Skip to main content

Amazon EMR

This page outlines how to use Amazon EMR via Livy as your Spark execution engine in Prophecy.

These instructions work for both Amazon EMR and Amazon EMR Serverless.

Create Amazon EMR cluster with Apache Livy

In your Amazon EMR service, create a cluster. When doing so:

  1. Under Application bundle select Custom.
  2. When selecting applications, make sure Livy and Spark are included in the install.
EMR create cluster

Configure network settings

To make sure that EMR can communicate with Prophecy, you need to configure specific network settings. Specifically, you need to modify the security groups of your EMR cluster.

  1. Modify the Primary Node security group to allow incoming connections to port 8998 from the Prophecy IP. You can do so by adding an inbound rule to the Master security group that permits incoming traffic on port 8998 from the Prophecy IP address.
  2. Modify the Core Node security group to allow outgoing connections to the Prophecy public IP 3.133.35.237 over HTTPS. Do this by adding an outbound rule to the Core security group that allows outgoing traffic over HTTPS protocol to the Prophecy public IP.

Create a Fabric

To connect EMR and Prophecy, you must create a Fabric. You can either create an EMR Fabric (suggested), or a Livy Fabric.

To create an EMR Fabric:

  1. Open Prophecy and click Create Entity from the left navigation menu. Then, click on the Fabric tile.
  2. Name your Fabric and click Continue.
  3. Keep the Provider Type as Spark, and choose EMR as the Provider.

  4. EMR Provider
  5. Enter your AWS credentials under Access Key and Secret Key. Then, enter the region that your EMR cluster is running in.
  6. Click on Fetch environments.
  7. Under Spark Environment, select the EMR cluster that you would like to connect to.
  8. Enter the S3 path that points to the location where you would like your logs to persist.

  9. EMR dependencies
  10. Add the Job size to your environment by clicking on Add Job Size. Configure your Job size and click on Add.
  11. Select File System under Scala Resolution mode and input s3://prophecy-public-bucket/prophecy-libs
  12. Select File System under Python Resolution mode and input s3://prophecy-public-bucket/python-prophecy-libs

Click Complete to save your new EMR Fabric.

At this point, you can test your Fabric. Open a project, connect to a cluster, and try to run a pipeline!