Skip to main content

Hive Table

Reads and writes data Hive tables that are managed by the execution environment's Metadata catalog (Metastore).

note

Please choose the provider as Hive on properties page.

Source

Source Parameters

ParameterDescriptionRequiredDefault
Database nameName of the databaseTrue
Table nameName of the tableTrue
ProviderMust be set to hiveTrue
Filter PredicateWhere clause to filter the tableFalse(all records)

Source Example

Generated Code

Without filter predicate

def Source(spark: SparkSession) -> DataFrame:
return spark.read.table(f"test_db.test_table")

With filter predicate

def Source(spark: SparkSession) -> DataFrame:
return spark.sql("SELECT * FROM test_db.test_table WHERE col > 10")

Target

Target Parameters

ParameterDescriptionRequiredDefault
Database nameName of the databaseTrue
Table nameName of the tableTrue
Custom file pathUse custom file path to store underlying filesFalse
ProviderMust be set to hiveTrue
Write ModeHow to handle existing data. See the this table for a list of available options.Trueerror
File FormatFile format to use when saving data. See this table for supported formatsTrueparquet
Partition ColumnsColumns to partition byFalse(empty)
Use insert intoIf true, use .insertInto instead of .save when generating codeFalsefalse

Supported Write Modes

Write ModeDescription
overwriteIf data already exists, overwrite with the contents of the DataFrame
appendIf data already exists, append the contents of the DataFrame
ignoreIf data already exists, do nothing with the contents of the DataFrame. This is similar to a CREATE TABLE IF NOT EXISTS in SQL.
errorIf data already exists, throw an exception.

Supported File formats

  1. Parquet
  2. Text file
  3. Avro
  4. ORC
  5. RC file
  6. Sequence file

Target Example

Generated Code

def Target(spark: SparkSession, in0: DataFrame):
in0.write\
.format("hive")\
.option("fileFormat", "parquet")\
.mode("overwrite")\
.saveAsTable("test_db.test_table")