Skip to main content

XML

The XML (Extensible Markup Language) file type:

  • Transfers data between two systems that store the same data in different formats.
  • Supports structured data with nested elements.

Parameters

ParameterTabDescription
LocationLocationFile path to read from or write to the XML file.
SchemaPropertiesSchema to apply on the loaded data.
In the Source gem, you can define or edit the schema visually or in JSON code.
In the Target gem, you can view the schema visually or as JSON code.

Source

The Source gem reads data from XML files and allows you to optionally specify the following additional properties.

Source properties

Property nameDescriptionDefault
Enforce SchemaWhether to use the schema you define.true
Row TagRow tag of your XML file to treat as a row._
Exclude AttributesWhether to exclude attributes in elements.false
Null ValueSets the string representation of a null value.null
Parser ModeHow to handle corrupt data.
For a list of the possible values, see Supported parser modes.
Permissive
Attribute PrefixPrefix for attributes to differentiate them from elements.None
Value TagTag to use for the value when there are attributes in the element with no child._VALUE
Ignore Surrounding SpacesWhether to skip surrounding whitespaces.false
Ignore NamespaceWhether to skip namespace prefixes on XML elements and attributes.false
Timestamp FormatSets the string that indicates a timestamp format.yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]
Date FormatString that indicates a date format.yyyy-MM-dd

Supported parser modes

ModeDescription
PermissivePut the malformed string into the corrupt records column, and set the malformed fields to null.
Drop MalformedIgnore the entire corrupted record. This mode is not supported in the CSV built-in functions.
Fail FastThrow an exception when it meets a corrupted record.

Target

The Target gem writes data to XML files and allows you to optionally specify the following additional properties.

Target properties

Property nameDescriptionDefault
Row TagRow tag of your XML file to treat as a row._
Root TagRoot tag of your XML file.ROWS
Null ValueSets the string representation of a null value.null
Attribute PrefixPrefix for attributes to differentiate them from elements.None
Value TagTag to use for the value when there are attributes in the element with no child._VALUE
Timestamp FormatSets the string that indicates a timestamp format.yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]
Date FormatString that indicates a date format.yyyy-MM-dd
Write ModeHow to handle existing data.
For a list of the possible values, see Supported write modes.
None
Partition ColumnList of columns to partition the XML file by.None
Compression CodecCompression codec when writing to the XML file.
The XML file supports the following codecs: none, bzip2, gzip, lz4, snappy and deflate.
None
XML DeclarationXML declaration content to write at the beginning of the XML file, before the root tag.version="1.0" encoding="UTF-8" standalone="yes"

Supported write modes

Write modeDescription
errorIf the data already exists, throw an exception.
overwriteIf the data already exists, overwrite the data with the contents of the DataFrame.
appendIf the data already exists, append the contents of the DataFrame.
ignoreIf the data already exists, do nothing with the contents of the DataFrame.
This is similar to the CREATE TABLE IF NOT EXISTS clause in SQL.