Skip to main content

Redshift

You can read from and write to Redshift.

Parameters

ParameterTabDescription
UsernameLocationUsername for your JDBC instance.
PasswordLocationPassword for your JDBC instance.
JDBC URLLocationJDBC URL to connect to.
The source-specific connection properties may be specified in the URL.
For example:
- jdbc:postgresql://test.us-east-1.rds.amazonaws.com:5432/postgres
- jdbc:mysql://database-mysql.test.us-east-1.rds.amazonaws.com:3306/mysql
Temporary DirectoryLocationS3 location to temporarily store data before it's loaded into Redshift.
Data SourceLocationStrategy to read data.
In the Source gem, you can select DB Table or SQL Query. In the Target gem, you must enter a table.
To learn more, see DB Table and SQL Query.
SchemaPropertiesSchema to apply on the loaded data.
In the Source gem, you can define or edit the schema visually or in JSON code.
In the Target gem, you can view the schema visually or as JSON code.

DB Table

The DB Table option dictates which table to use as the source to read from. You can use anything valid in a FROM clause of a SQL query. For example, instead of a table name, use a subquery in parentheses.

caution

The DB Table option and the query parameter are mutually exclusive, which means that you cannot specify both at the same time.

SQL Query

The SQL Query option specifies which query to use as a subquery in the FROM clause. Spark also assigns an alias to the subquery clause. For example, Spark issues the following query to the JDBC Source:

SELECT columns FROM (<user_specified_query>) spark_gen_alias

The following restrictions exist when you use this option:

  1. You cannot use the query and partitionColumn options at the same time.
  2. If you must specify the partitionColumn option, you can specify the subquery using the dbtable option and qualify your partition columns using the subquery alias provided as part of dbtable.

Source

The Source gem reads data from Redshift and allows you to optionally specify the following additional properties.

Source properties

PropertyDescriptionDefault
Forward S3 access credentials to DatabricksWhether to forward S3 access credentials to Databricks.false
DriverClass name of the Redshift driver to connect to this URL.None
AWS IAM RoleIdentity that grants permissions to access other AWS servicesNone
Temporary AWS access key idWhether to allow temporary credentials for authenticating to Redshift.false

Target

The Target gem writes data to Redshift and allows you to optionally specify the following additional properties.

Target properties

PropertyDescriptionDefault
Forward S3 access credentials to DatabricksWhether to forward S3 access credentials to Databricks.false
DriverClass name of the Redshift driver to connect to this URL.None
AWS IAM RoleIdentity that grants permissions to access other AWS servicesNone
Temporary AWS access key idWhether to allow a temporary credential for authenticating to Redshift.false
Max length for string columns in redshiftMaximum length for string columns in Redshift.2048
Row distribution style for new tableHow to distribute data in a new table.
For a list of the possible values, see Supported distribution styles.
None
Distribution key for new tableIf you selected Key as the Row distribution style for new table property, specify the key to distribute by.None

Supported distribution styles

Distribution styleDescription
EVENDistribute the rows across the slices in a round-robin fashion.
KEYDistribute according to the values in one column.
ALLA copy of the entire table is distributed to every node.