Skip to main content

FTP

Allows you to read or write files (csv, text and binary) on a remote location

Source Parameters

ParameterDescriptionRequired
Dataset NameName of the DatasetTrue
Credential TypeCredential Type: Databricks Secrets or Username & PasswordTrue
CredentialsDatabricks credential name , else username and password for the remote accountRequired if Databricks Secrets is opted as Credential Type
UsernameLogin name for the remote userRequired if Username & Password is opted as Credential Type
PasswordPassword for the remote userRequired if Username & Password is opted as Credential Type
ProtocolProtocol to use for file transfer: FTP or SFTPRequired if Username & Password is opted as Credential Type
Hosthostname for your remote account.
Eg: prophecy.files.com
True
PathPath of the file(s) or folder to be loaded. Supports wildcard matching at the lowest level of the path.
Eg: /folder, /folder/test.csv, /folder/*.csv
True
File FormatFormat of the file to be loaded.
Supported formats are text, csv and binary
True

Target Parameters

ParameterDescriptionRequired
Dataset NameName of the DatasetTrue
Credential TypeCredential Type: Databricks Secrets or Username & PasswordTrue
CredentialsDatabricks credential name , else username and password for the remote accountRequired if Databricks Secrets is opted as Credential Type
UsernameLogin name for the remote userRequired if Username & Password is opted as Credential Type
PasswordPassword for the remote userRequired if Username & Password is opted as Credential Type
ProtocolProtocol to use for file transfer: FTP or SFTPRequired if Username & Password is opted as Credential Type
Hosthostname for your remote account.
Eg: prophecy.files.com
True
PathPath of the file(s) or folder to be loaded. Supports wildcard matching at the lowest level of the path.
Eg: /folder, /folder/test.csv, /folder/*.csv
True
File FormatFormat of the file to be loaded.
Supported formats are text, csv and binary
True
Write ModeHow to handle existing data if present while writing. Error or OverwriteTrue
info

Based on the selected File Format, you can provide additional read/write options in the Properties tab. For example, if the File Format is CSV, you can set CSV specific options like header, separator etc.

note

For SFTP, make sure you have the dependency io.prophecy.spark:filetransfer_2.12:0.1.1 included in your Pipeline. Read more about how to manage dependencies. Adding SFTP dependency

Loading a CSV file from SFTP

Step 1 - Create Source Component

object load_csv {
def apply(spark: SparkSession): DataFrame = {
import com.databricks.dbutils_v1.DBUtilsHolder.dbutils
locally {
var reader = spark.read
.format("io.prophecy.spark.filetransfer")
.option("protocol", "sftp")
.option("host", "prophecy.files.com")
.option("port", "22")
.option("username", "maciej@prophecy.io")
.option("password", "******")
.option("fileFormat", "csv")
reader = reader
.option("header", Some(true).getOrElse(false))
.option("sep", Some(",").getOrElse(","))
reader.load("/folder/*.csv")
}
}

}

Writing a CSV file to SFTP

Step 1 - Create Target Component

object write_csv {
def apply(spark: SparkSession, in: DataFrame): Unit = {
import com.databricks.dbutils_v1.DBUtilsHolder.dbutils
var writer = in
.coalesce(1)
.write
.format("io.prophecy.spark.filetransfer")
.option("protocol", "sftp")
.option("host", "prophecy.files.com")
.option("port", "22")
.option("username", "maciej@prophecy.io")
.option("password", "******")
.option("fileFormat", "csv")
.mode("overwrite")
writer = writer
.option("header", Some(true).getOrElse(false))
.option("sep", Some(",").getOrElse(","))
writer.save("/cust_output.csv")
}
}