Skip to main content

ColumnParser

The ColumnParser lets you parse XML or JSON that is included in a column of your table.

Parameters

ParameterDescription
Source Column NameThe name of the column that contains the XML or JSON records.
Parser TypeThe format that you want to be parsed (XML or JSON).
Parsing MethodThe method that Prophecy will use to generate the schema of the output.

When you select a parsing method, you have three options:

  • Parse automatically. Prophecy infers the schema by reading the first 40 records.
  • Parse from sample record. Prophecy uses the schema that you provide in the sample record.
  • Parse from schema. Prophecy uses the schema that you provide in the form of a schema struct.

Output

The schema of the ColumnParser gem output includes the parsed content as a struct data type, in addition to all of the input columns.

New output struct

Example code

tip

To see the generated source code of your project, switch to the Code view in the project header.

This example shows the code to parse XML.

def xml_column_parser(spark: SparkSession, in0: DataFrame) -> DataFrame:
from prophecy.libs.utils import xml_parse

return xml_parse(in0, "XML", "parseAuto", None, None)