ColumnParser
ProphecySparkBasicsPython0.2.27+ProphecyLibsPython1.9.16ProphecyLibsScala8.2.1Databricks UC Single Cluster14.3+Databricks UC Shared14.3+LivyNot Supported
The ColumnParser lets you parse XML or JSON that is included in a column of your table.
Parameters
Parameter | Description |
---|---|
Source Column Name | The name of the column that contains the XML or JSON records. |
Parser Type | The format that you want to be parsed (XML or JSON). |
Parsing Method | The method that Prophecy will use to generate the schema of the output. |
When you select a parsing method, you have three options:
- Parse automatically. Prophecy infers the schema by reading the first 40 records.
- Parse from sample record. Prophecy uses the schema that you provide in the sample record.
- Parse from schema. Prophecy uses the schema that you provide in the form of a schema struct.
Output
The schema of the ColumnParser gem output includes the parsed content as a struct data type, in addition to all of the input columns.
Example code
tip
To see the generated source code of your project, switch to the Code view in the project header.
This example shows the code to parse XML.
- Python
def xml_column_parser(spark: SparkSession, in0: DataFrame) -> DataFrame:
from prophecy.libs.utils import xml_parse
return xml_parse(in0, "XML", "parseAuto", None, None)