Hive Table
Reads from and writes to Hive tables that your execution environment's Metadata catalog manages.
Prerequisites
Before you specify parameters and properties, select the Hive table type:
- Open the Source or Target gem configuration.
- On the Type & Format page, select Catalog Table.
- On the Properties page, set the provider property to
hive
.
Parameters
Parameter | Tab | Description |
---|---|---|
Use Unity Catalog | Location | Whether to use a Unity catalog. |
Catalog | Location | If you use a unity catalog, specify which catalog to use. |
Database | Location | Name of the database to connect to |
Table | Location | Name of the table to connect to. |
Use file path | Location | Whether to use a custom file path to store underlying files in the Target gem. |
Schema | Properties | Schema to apply on the loaded data. In the Source gem, you can define or edit the schema visually or in JSON code. In the Target gem, you can view the schema visually or as JSON code. |
Source
The Source gem reads data from Hive tables and allows you to optionally specify the following additional properties.
Source properties
Properties | Description | Default |
---|---|---|
Description | Description of your dataset. | None |
Provider | Provider to use. You must set this to hive . | delta |
Filter Predicate | Where clause to filter the table by. | (all records) |
Source example
Compiled code
tip
To see the compiled code of your project, switch to the Code view in the project header.
Without filter predicate
- Python
- Scala
def Source(spark: SparkSession) -> DataFrame:
return spark.read.table(f"test_db.test_table")
object Source {
def apply(spark: SparkSession): DataFrame = {
spark.read.table("test_db.test_table")
}
}
With filter predicate
- Python
- Scala
def Source(spark: SparkSession) -> DataFrame:
return spark.sql("SELECT * FROM test_db.test_table WHERE col > 10")
object Source {
def apply(spark: SparkSession): DataFrame =
spark.sql("SELECT * FROM test_db.test_table WHERE col > 10")
}
Target
The Target gem writes data to Delta tables and allows you to optionally specify the following additional properties.
Target properties
Property | Description | Default |
---|---|---|
Description | Description of your dataset. | None |
Provider | Provider to use. You must set this to hive . | delta |
Write Mode | How to handle existing data. For a list of the possible values, see Supported write modes. | error |
File Format | File format to use when saving data. Supported file formats are: sequencefile , rcfile , orc , parquet , textfile , and avro . | parquet |
Partition Columns | List of columns to partition the Hive table table by. | None |
Use insert into | Whether to use the insertInto() method to write instead of the save() method. | false |
Supported write modes
Write mode | Description |
---|---|
overwrite | If the data already exists, overwrite the data with the contents of the DataFrame . |
error | If the data already exists, throw an exception. |
append | If the data already exists, append the contents of the DataFrame . |
ignore | If the data already exists, do nothing with the contents of the DataFrame . This is similar to the CREATE TABLE IF NOT EXISTS clause in SQL. |
Target example
Compiled code
tip
To see the compiled code of your project, switch to the Code view in the project header.
- Python
- Scala
def Target(spark: SparkSession, in0: DataFrame):
in0.write\
.format("hive")\
.option("fileFormat", "parquet")\
.mode("overwrite")\
.saveAsTable("test_db.test_table")
object Target {
def apply(spark: SparkSession, in: DataFrame): DataFrame = {
in.write
.format("hive")
.option("fileFormat", "parquet")
.mode("overwrite")
.saveAsTable("test_db.test_table")
}
}