Skip to main content

Hive Table

Reads from and writes to Hive tables that your execution environment's Metadata catalog manages.

Prerequisites

Before you specify parameters and properties, select the Hive table type:

  1. Open the Source or Target gem configuration.
  2. On the Type & Format page, select Catalog Table.
  3. On the Properties page, set the provider property to hive.

Parameters

ParameterTabDescription
Use Unity CatalogLocationWhether to use a Unity catalog.
CatalogLocationIf you use a unity catalog, specify which catalog to use.
DatabaseLocationName of the database to connect to
TableLocationName of the table to connect to.
Use file pathLocationWhether to use a custom file path to store underlying files in the Target gem.
SchemaPropertiesSchema to apply on the loaded data.
In the Source gem, you can define or edit the schema visually or in JSON code.
In the Target gem, you can view the schema visually or as JSON code.

Source

The Source gem reads data from Hive tables and allows you to optionally specify the following additional properties.

Source properties

PropertiesDescriptionDefault
DescriptionDescription of your dataset.None
ProviderProvider to use. You must set this to hive.delta
Filter PredicateWhere clause to filter the table by.(all records)

Source example

Compiled code

tip

To see the compiled code of your project, switch to the Code view in the project header.

Without filter predicate

def Source(spark: SparkSession) -> DataFrame:
return spark.read.table(f"test_db.test_table")

With filter predicate

def Source(spark: SparkSession) -> DataFrame:
return spark.sql("SELECT * FROM test_db.test_table WHERE col > 10")

Target

The Target gem writes data to Delta tables and allows you to optionally specify the following additional properties.

Target properties

PropertyDescriptionDefault
DescriptionDescription of your dataset.None
ProviderProvider to use. You must set this to hive.delta
Write ModeHow to handle existing data. For a list of the possible values, see Supported write modes.error
File FormatFile format to use when saving data.
Supported file formats are: sequencefile, rcfile, orc, parquet, textfile, and avro.
parquet
Partition ColumnsList of columns to partition the Hive table table by.None
Use insert intoWhether to use the insertInto() method to write instead of the save() method.false

Supported write modes

Write modeDescription
overwriteIf the data already exists, overwrite the data with the contents of the DataFrame.
errorIf the data already exists, throw an exception.
appendIf the data already exists, append the contents of the DataFrame.
ignoreIf the data already exists, do nothing with the contents of the DataFrame.
This is similar to the CREATE TABLE IF NOT EXISTS clause in SQL.

Target example

Compiled code

tip

To see the compiled code of your project, switch to the Code view in the project header.

def Target(spark: SparkSession, in0: DataFrame):
in0.write\
.format("hive")\
.option("fileFormat", "parquet")\
.mode("overwrite")\
.saveAsTable("test_db.test_table")