Salesforce
This Gem has below features:
- Dataset Creation - Create Dataset in Salesforce Wave from Spark DataFrame.
- Read Salesforce Wave Dataset - User has to provide SAQL to read data from Salesforce Wave. The query result will be constructed as DataFrame.
- Read Salesforce Object - User has to provide SOQL to read data from Salesforce object. The query result will be constructed as DataFrame.
- Update Salesforce Object - Salesforce object will be updated with the details present in DataFrame.
note
This connector is built on top of the already available spark-salesforce connector
.
To use this Gem in Prophecy, com.springml:spark-salesforce_2.12:1.1.4
Maven external dependency needs to be installed on cluster.
For installing dependencies from Prophecy UI. Please check dependency management docs.
Source
Reads data from Salesforce object and wave Datasets.
Source Parameters
Parameter | Description | Required |
---|---|---|
Dataset Name | Name of the Dataset | True |
Credential Type | Credential Type: Databricks Secrets or Username & Password | True |
Credentials | Databricks credential name, else username and password for the snowflake account | Required if Credential Type is Databricks Secrets |
Username | Salesforce Wave Username. This user should have privilege to upload Datasets or execute SAQL or execute SOQL. | Required if Credential Type is Username & Password |
Password | Salesforce Wave Password. Please append security token along with password. For example, if a user’s password is mypassword, and the security token is XXXXXXXXXX, the user must provide mypasswordXXXXXXXXXX | Required if Credential Type is Username & Password |
Login Url | (Optional) Salesforce Login URL. Default value https://login.salesforce.com. | True |
Read from source | Strategy to read data: SAQL or SOQL . | True |
SAQL Query | (Optional) SAQL query to used to query Salesforce Wave. Mandatory for reading Salesforce Wave Dataset | |
SOQL Query | (Optional) SOQL query to used to query Salesforce Object. Mandatory for reading Salesforce Object like Opportunity | |
Version | (Optional) Salesforce API Version. Default 35.0 | |
Infer Schema | (Optional) Infer schema from the query results. Sample rows will be taken to find the datatype. | |
Date Format | (Optional) A string that indicates the format that follow java.text.SimpleDateFormat to use when reading timestamps. This applies to TimestampType. By default, it is null which means trying to parse timestamp by java.sql.Timestamp.valueOf() . | |
Result Variable | (Optional) result variable used in SAQL query. To paginate SAQL queries this package will add the required offset and limit. For example, in this SAQL query q = load \"<Dataset_id>/<Dataset_version_id>\" ; q = foreach q generate 'Name' as 'Name', 'Email' as 'Email'; q is the result variable. | |
Page Size | (Optional) Page size for each query to be executed against Salesforce Wave. Default value is 2000. This option can only be used if resultVariable is set. | |
Bulk | (Optional) Flag to enable bulk query. This is the preferred method when loading large sets of data. Salesforce will process batches in the background. Default value is false. | |
PK Chunking | (Optional) Flag to enable automatic primary key chunking for bulk query Job. This splits bulk queries into separate batches that of the size defined by chunkSize option. By default false and the default chunk size is 100,000. | |
Chunk size | (Optional) The size of the number of records to include in each batch. Default value is 100,000. This option can only be used when pkChunking is true. Maximum size is 250,000. | |
Timeout | (Optional) The maximum time spent polling for the completion of bulk query Job. This option can only be used when bulk is true. | |
Max chars per column | (Optional) The maximum length of a column. This option can only be used when bulk is true. Default value is 4096. | |
Query All | (Optional) Toggle to retrieve deleted and archived records for SOQL queries. Default value is false. |
info
Steps to reset your Salesforce security token can be found at this link.
Example
Below is an example of fetching all leads from sales cloud using Prophecy IDE.
We will be using SOQL
query to query our leads Dataset on sales cloud.
Generated Code
- Python
def read_salesforce(spark: SparkSession) -> DataFrame:
return spark.read\
.format("com.springml.spark.salesforce")\
.option("username", "your_salesforce_username")\
.option("password", "your_salesforce_password_with_secutiry_token")\
.option("soql", "select id, name, email from lead")\
.load()
Target
Create/update Datasets and Salesforce objects.
Target Parameters
Parameter | Description | Required |
---|---|---|
Dataset Name | Name of the Dataset | True |
Credential Type | Credential Type: Databricks Secrets or Username & Password | True |
Credentials | Databricks credential name, else username and password for the snowflake account | Required if Credential Type is Databricks Secrets |
Username | Salesforce Wave Username. This user should have privilege to upload Datasets or execute SAQL or execute SOQL. | Required if Credential Type is Username & Password |
Password | Salesforce Wave Password. Please append security token along with password.For example, if a user’s password is mypassword, and the security token is XXXXXXXXXX, the user must provide mypasswordXXXXXXXXXX | Required if Credential Type is Username & Password |
Login Url | (Optional) Salesforce Login URL. Default value https://login.salesforce.com. | True |
Salesforce Dataset name | (Optional) Name of the Dataset to be created in Salesforce Wave. Required for Dataset Creation. | |
Salesforce object name | (Optional) Salesforce Object to be updated. (e.g.) Contact. Mandatory if bulk is true. | |
Metadata Config in JSON | (Optional) Metadata configuration which will be used to construct [Salesforce Wave Dataset Metadata] (https://resources.docs.salesforce.com/sfdc/pdf/bi_dev_guide_ext_data_format.pdf). Metadata configuration has to be provided in JSON format. | |
Upsert | (Optional) Flag to upsert data to Salesforce. This performs an insert or update operation using the "externalIdFieldName" as the primary ID. Existing fields that are not in the DataFrame being pushed will not be updated. Default "false". | |
External Id Field Name | (Optional) The name of the field used as the external ID for Salesforce Object. This value is only used when doing an update or upsert. Default "Id". |
Generated Code
- Python
def write_salesforce(spark: SparkSession, in0: DataFrame):
in0.write.format("com.springml.spark.salesforce")\
.option("username", "your_salesforce_username")\
.option("password", "your_salesforce_password_with_secutiry_token")\
.option("DatasetName", "your_Dataset_name")
.save()