Skip to main content

Salesforce

Built on

This connector is built on top of the already available Spark Salesforce Library.

Install the com.springml:spark-salesforce_2.12:1.1.4 Maven external dependency on your cluster. To learn about installing dependencies in Prophecy UI, see Spark dependencies.

With the Source and Target gem, you can perform the following with Salesforce:

  • Create datasets in Salesforce Wave from Spark DataFrame.
  • Read a Salesforce Wave dataset where the user provides a SAQL to read data from Salesforce Wave. The Source gem constructs the query result as a DataFrame.
  • Read a Salesforce object where the user provides a SOQL to read data from Salesforce object. The Source gem constructs a query result as a DataFrame.
  • Update a Salesforce object where the Target gem updates the Salesforce object with the details present in DataFrame.

Prerequisites

Before you specify parameters and properties, select the Salesforce application:

  1. Open the gem configuration.
  2. On the Type & Format page, navigate to the Applications tab.
  3. Select Salesforce.

Parameters

ParameterTabDescription
CredentialsLocationHow to provide your credentials.
You can select: Databricks Secrets, Username & Password, or Environment variables
User NameLocationSalesforce Wave username.
This user must have privileges to upload datasets or execute SAQL or SOQL.
PasswordLocationSalesforce Wave Password.
Append your security token along with password.
To reset your Salesforce security token, see Reset Your Security Token.
Login URLLocationSalesforce Login URL.
Default: https://login.salesforce.com.
Data SourceLocationStrategy to read data in the Source gem.
Possible values are: SAQL, or SOQL.
SAQL QueryLocationIf you select SAQL as the Data Source, SAQL query to use to query Salesforce Wave.
SOQL QueryLocationIf you select SOQL as the Data Source, SOQL query to used to query Salesforce Object.

Source

The Source gem reads data from Salesforce objects and allows you to optionally specify the following additional properties.

Source properties

PropertiesDescriptionDefault
DescriptionDescription of your dataset.None
Primary key chunking (Optional)Whether to enable automatic primary key chunking for bulk query job.
This splits bulk queries into separate batches of the size defined by Chunk size property.
100000
Chunk sizeNumber of records to include in each batch.
You can only use this property when you enable Primary key chunking. Maximum size is 250000.
100000
TimeoutMaximum time spent polling for the completion of bulk query job.
You can only use this property when you enable bulk query.
false
Max Length of column (Optional)Maximum length of a column.
You can only use this property when you enable bulk query.
4096
External ID field name for Salesforce Object (Optional)Name of the external ID field in a Salesforce object.Id
Enable bulk query (Optional)Whether to enable bulk query.
This is the preferred method when loading large sets of data. Salesforce processes batches in the background.
false
Retrieve deleted and archived records (Optional)Whether to retrieve deleted and archived records for SOQL queries.false
Infer SchemaWhether to infer schema from the query results.
The Source gem takes sample rows to find the datatype.
false
Date FormatString that indicates the format for java.text.SimpleDateFormat to follow when reading timestamps.
This applies to TimestampType.
null
Salesforce API Version (Optional)Version of the Salesforce API to use.35.0

Example

The following example uses a SOQL query to query our leads dataset on the sales cloud.

Compiled code

tip

To see the compiled code of your project, switch to the Code view in the project header.

def read_salesforce(spark: SparkSession) -> DataFrame:
return spark.read\
.format("com.springml.spark.salesforce")\
.option("username", "your_salesforce_username")\
.option("password", "your_salesforce_password_with_secutiry_token")\
.option("soql", "select id, name, email from lead")\
.load()

Target

The Target gem writes data to Salesforce objects and allows you to optionally specify the following additional properties.

Target properties

PropertyDescriptionDefault
DescriptionDescription of your dataset.None
SF object to be updated (Optional)Salesforce object to update when you enable bulk query.false
Name of the dataset to be created in Salesforce WaveName of the Dataset to create in Salesforce Wave.None
Metadata configuration in json (Optional)JSON formatted metadata configuration to construct a Salesforce Wave Dataset Metadata.None
External ID field name for Salesforce Object (Optional)Name of the external ID field in a Salesforce object when the Target gem updates or upserts into Salesforce.Id
Flag to upsert data to Salesforce (Optional)Whether to upsert data to Salesforce.
This property performs an insert or update operation using the externalIdFieldName as the primary ID. The Target gem does not update existing fields that are not in the DataFrame being pushed.
false

Compiled code

tip

To see the compiled code of your project, switch to the Code view in the project header.

def write_salesforce(spark: SparkSession, in0: DataFrame):
in0.write.format("com.springml.spark.salesforce")\
.option("username", "your_salesforce_username")\
.option("password", "your_salesforce_password_with_secutiry_token")\
.option("DatasetName", "your_Dataset_name")
.save()