Hive Table
Reads and writes data Hive tables that are managed by the execution environment's Metadata catalog (Metastore).
note
Choose the provider as Hive
on properties page.
Source
Source Parameters
Parameter | Description | Required | Default |
---|---|---|---|
Database name | Name of the database | True | |
Table name | Name of the table | True | |
Provider | Must be set to hive | True | |
Filter Predicate | Where clause to filter the table | False | (all records) |
Source Example
Generated Code
Without filter predicate
- Python
- Scala
def Source(spark: SparkSession) -> DataFrame:
return spark.read.table(f"test_db.test_table")
object Source {
def apply(spark: SparkSession): DataFrame = {
spark.read.table("test_db.test_table")
}
}
With filter predicate
- Python
- Scala
def Source(spark: SparkSession) -> DataFrame:
return spark.sql("SELECT * FROM test_db.test_table WHERE col > 10")
object Source {
def apply(spark: SparkSession): DataFrame =
spark.sql("SELECT * FROM test_db.test_table WHERE col > 10")
}
Target
Target Parameters
Parameter | Description | Required | Default |
---|---|---|---|
Database name | Name of the database | True | |
Table name | Name of the table | True | |
Custom file path | Use custom file path to store underlying files. | False | |
Provider | Must be set to hive | True | |
Write Mode | How to handle existing data. See the this table for a list of available options. | True | error |
File Format | File format to use when saving data. See this table for supported formats. | True | parquet |
Partition Columns | Columns to partition by | False | (empty) |
Use insert into | If true , use .insertInto instead of .save when generating code. | False | false |
Supported Write Modes
Write Mode | Description |
---|---|
overwrite | If data already exists, overwrite with the contents of the DataFrame. |
append | If data already exists, append the contents of the DataFrame. |
ignore | If data already exists, do nothing with the contents of the DataFrame. This is similar to a CREATE TABLE IF NOT EXISTS in SQL. |
error | If data already exists, throw an exception. |
Supported File formats
- Parquet
- Text file
- Avro
- ORC
- RC file
- Sequence file
Target Example
Generated Code
- Python
- Scala
def Target(spark: SparkSession, in0: DataFrame):
in0.write\
.format("hive")\
.option("fileFormat", "parquet")\
.mode("overwrite")\
.saveAsTable("test_db.test_table")
object Target {
def apply(spark: SparkSession, in: DataFrame): DataFrame = {
in.write
.format("hive")
.option("fileFormat", "parquet")
.mode("overwrite")
.saveAsTable("test_db.test_table")
}
}