Skip to main content

Salesforce

This Gem has below features:

  1. Dataset Creation - Create Dataset in Salesforce Wave from Spark DataFrame.
  2. Read Salesforce Wave Dataset - User has to provide SAQL to read data from Salesforce Wave. The query result will be constructed as DataFrame.
  3. Read Salesforce Object - User has to provide SOQL to read data from Salesforce object. The query result will be constructed as DataFrame.
  4. Update Salesforce Object - Salesforce object will be updated with the details present in DataFrame.
note

This connector is built on top of the already available spark-salesforce connector.

To use this Gem in Prophecy, com.springml:spark-salesforce_2.12:1.1.4 Maven external dependency needs to be installed on cluster. For installing dependencies from Prophecy UI. Please check dependency management docs.

Source

Reads data from Salesforce object and wave Datasets.

Source Parameters

ParameterDescriptionRequired
Dataset NameName of the DatasetTrue
Credential TypeCredential Type: Databricks Secrets or Username & PasswordTrue
CredentialsDatabricks credential name, else username and password for the snowflake accountRequired if Credential Type is Databricks Secrets
UsernameSalesforce Wave Username. This user should have privilege to upload Datasets or execute SAQL or execute SOQL.Required if Credential Type is Username & Password
PasswordSalesforce Wave Password. Please append security token along with password. For example, if a user’s password is mypassword, and the security token is XXXXXXXXXX, the user must provide mypasswordXXXXXXXXXXRequired if Credential Type is Username & Password
Login Url(Optional) Salesforce Login URL. Default value https://login.salesforce.com.True
Read from sourceStrategy to read data: SAQL or SOQL.True
SAQL Query(Optional) SAQL query to used to query Salesforce Wave. Mandatory for reading Salesforce Wave Dataset
SOQL Query(Optional) SOQL query to used to query Salesforce Object. Mandatory for reading Salesforce Object like Opportunity
Version(Optional) Salesforce API Version. Default 35.0
Infer Schema(Optional) Infer schema from the query results. Sample rows will be taken to find the datatype.
Date Format(Optional) A string that indicates the format that follow java.text.SimpleDateFormat to use when reading timestamps.
This applies to TimestampType. By default, it is null which means trying to parse timestamp by java.sql.Timestamp.valueOf().
Result Variable(Optional) result variable used in SAQL query. To paginate SAQL queries this package will add the required offset and limit.
For example, in this SAQL query q = load \"<Dataset_id>/<Dataset_version_id>\"; q = foreach q generate 'Name' as 'Name', 'Email' as 'Email'; q is the result variable.
Page Size(Optional) Page size for each query to be executed against Salesforce Wave. Default value is 2000.
This option can only be used if resultVariable is set.
Bulk(Optional) Flag to enable bulk query. This is the preferred method when loading large sets of data. Salesforce will process batches in the background. Default value is false.
PK Chunking(Optional) Flag to enable automatic primary key chunking for bulk query Job. This splits bulk queries into separate batches that of the size defined by chunkSize option. By default false and the default chunk size is 100,000.
Chunk size(Optional) The size of the number of records to include in each batch. Default value is 100,000. This option can only be used when pkChunking is true. Maximum size is 250,000.
Timeout(Optional) The maximum time spent polling for the completion of bulk query Job.
This option can only be used when bulk is true.
Max chars per column(Optional) The maximum length of a column. This option can only be used when bulk is true. Default value is 4096.
Query All(Optional) Toggle to retrieve deleted and archived records for SOQL queries. Default value is false.
info

Steps to reset your Salesforce security token can be found at this link.

Example

Below is an example of fetching all leads from sales cloud using Prophecy IDE. We will be using SOQL query to query our leads Dataset on sales cloud.

Generated Code

def read_salesforce(spark: SparkSession) -> DataFrame:
return spark.read\
.format("com.springml.spark.salesforce")\
.option("username", "your_salesforce_username")\
.option("password", "your_salesforce_password_with_secutiry_token")\
.option("soql", "select id, name, email from lead")\
.load()


Target

Create/update Datasets and Salesforce objects.

Target Parameters

ParameterDescriptionRequired
Dataset NameName of the DatasetTrue
Credential TypeCredential Type: Databricks Secrets or Username & PasswordTrue
CredentialsDatabricks credential name, else username and password for the snowflake accountRequired if Credential Type is Databricks Secrets
UsernameSalesforce Wave Username. This user should have privilege to upload Datasets or execute SAQL or execute SOQL.Required if Credential Type is Username & Password
PasswordSalesforce Wave Password. Please append security token along with password.For example, if a user’s password is mypassword, and the security token is XXXXXXXXXX, the user must provide mypasswordXXXXXXXXXXRequired if Credential Type is Username & Password
Login Url(Optional) Salesforce Login URL. Default value https://login.salesforce.com.True
Salesforce Dataset name(Optional) Name of the Dataset to be created in Salesforce Wave. Required for Dataset Creation.
Salesforce object name(Optional) Salesforce Object to be updated. (e.g.) Contact. Mandatory if bulk is true.
Metadata Config in JSON(Optional) Metadata configuration which will be used to construct [Salesforce Wave Dataset Metadata]
(https://resources.docs.salesforce.com/sfdc/pdf/bi_dev_guide_ext_data_format.pdf). Metadata configuration has to be provided in JSON format.
Upsert(Optional) Flag to upsert data to Salesforce. This performs an insert or update operation using the "externalIdFieldName" as the primary ID. Existing fields that are not in the DataFrame being pushed will not be updated. Default "false".
External Id Field Name(Optional) The name of the field used as the external ID for Salesforce Object. This value is only used when doing an update or upsert. Default "Id".

Generated Code

def write_salesforce(spark: SparkSession, in0: DataFrame):
in0.write.format("com.springml.spark.salesforce")\
.option("username", "your_salesforce_username")\
.option("password", "your_salesforce_password_with_secutiry_token")\
.option("DatasetName", "your_Dataset_name")
.save()