Salesforce

ProphecyWebAppPython 0.0.1+UC Dedicated Cluster Not Supported UC Standard Cluster Not Supported Livy Not Supported

Use Salesforce as a source or target in your pipeline to read from or write to Salesforce Wave datasets and standard Salesforce objects.

Capabilities

When you use Salesforce as a source or target, you can:

Create datasets in Salesforce Wave from a Spark DataFrame.
Read a Salesforce Wave dataset using a SAQL query. The gem returns the result as a DataFrame.
Read a Salesforce object using a SOQL query. The gem returns the result as a DataFrame.
Update a Salesforce object using data from a DataFrame.

Prerequisites

This connector uses the Spark Salesforce Library.

To run the gem, add the following Maven dependency to your cluster: com.springml:spark-salesforce_2.12:1.1.4

Parameters

Parameter	Tab	Description
Credentials	Location	How to provide your credentials. You can select: `Databricks Secrets`, `Username & Password`, or `Environment variables`
User Name	Location	Salesforce Wave username. This user must have privileges to upload datasets or execute SAQL or SOQL.
Password	Location	Salesforce Wave Password. Append your security token along with password. To reset your Salesforce security token, see Reset Your Security Token.
Login URL	Location	Salesforce Login URL. Default: `https://login.salesforce.com.`
Data Source	Location	Strategy to read data in the Source gem. Possible values are: `SAQL`, or `SOQL`.
SAQL Query	Location	If you select `SAQL` as the Data Source, SAQL query to use to query Salesforce Wave.
SOQL Query	Location	If you select `SOQL` as the Data Source, SOQL query to used to query Salesforce Object.

Source

The Source gem reads data from Salesforce objects and allows you to optionally specify the following additional properties.

Source properties

Properties	Description	Default
Description	Description of your dataset.	None
Primary key chunking (Optional)	Whether to enable automatic primary key chunking for bulk query job. This splits bulk queries into separate batches of the size defined by Chunk size property.	`100000`
Chunk size	Number of records to include in each batch. You can only use this property when you enable Primary key chunking. Maximum size is `250000`.	`100000`
Timeout	Maximum time spent polling for the completion of bulk query job. You can only use this property when you enable bulk query.	false
Max Length of column (Optional)	Maximum length of a column. You can only use this property when you enable bulk query.	`4096`
External ID field name for Salesforce Object (Optional)	Name of the external ID field in a Salesforce object.	`Id`
Enable bulk query (Optional)	Whether to enable bulk query. This is the preferred method when loading large sets of data. Salesforce processes batches in the background.	false
Retrieve deleted and archived records (Optional)	Whether to retrieve deleted and archived records for SOQL queries.	false
Infer Schema	Whether to infer schema from the query results. The Source gem takes sample rows to find the datatype.	false
Date Format	String that indicates the format for `java.text.SimpleDateFormat` to follow when reading timestamps. This applies to `TimestampType`.	null
Salesforce API Version (Optional)	Version of the Salesforce API to use.	`35.0`

Example

The following example uses a SOQL query to query our leads dataset on the sales cloud.

Compiled code

tip

To see the compiled code of your project, switch to the Code view in the project header.

Python

def read_salesforce(spark: SparkSession) -> DataFrame:
    return spark.read\
        .format("com.springml.spark.salesforce")\
        .option("username", "your_salesforce_username")\
        .option("password", "your_salesforce_password_with_secutiry_token")\
        .option("soql", "select id, name, email from lead")\
        .load()

Target

The Target gem writes data to Salesforce objects and allows you to optionally specify the following additional properties.

Target properties

Property	Description	Default
Description	Description of your dataset.	None
SF object to be updated (Optional)	Salesforce object to update when you enable bulk query.	false
Name of the dataset to be created in Salesforce Wave	Name of the Dataset to create in Salesforce Wave.	None
Metadata configuration in json (Optional)	JSON formatted metadata configuration to construct a Salesforce Wave Dataset Metadata.	None
External ID field name for Salesforce Object (Optional)	Name of the external ID field in a Salesforce object when the Target gem updates or upserts into Salesforce.	`Id`
Flag to upsert data to Salesforce (Optional)	Whether to upsert data to Salesforce. This property performs an insert or update operation using the `externalIdFieldName` as the primary ID. The Target gem does not update existing fields that are not in the `DataFrame` being pushed.	false

Compiled code

tip

To see the compiled code of your project, switch to the Code view in the project header.

Python

def write_salesforce(spark: SparkSession, in0: DataFrame):
    in0.write.format("com.springml.spark.salesforce")\
		  .option("username", "your_salesforce_username")\
		  .option("password", "your_salesforce_password_with_secutiry_token")\
		  .option("DatasetName", "your_Dataset_name")
		  .save()

Capabilities​

Prerequisites​

Parameters​

Source​

Source properties​

Example​

Compiled code​

Target​

Target properties​

Compiled code​

Capabilities

Prerequisites

Parameters

Source

Source properties

Example

Compiled code

Target

Target properties

Compiled code