Split XML And Repeat Header Element
This article shows how to use the XmlSplit program to split an XML document and include the header element from
the source document in each of the split files. It also demonstrates how to skip content up to the start point for splitting
as well as executing the split on elements deep in the XML hierarchy.
The header element is not a technical term used in the XML specification, but is widely used
by developers to refer to the first element under the root. A header element is typically a
unique element that contains identity or desciption information pertaining to the XML document. When there is a
need to split the document into smaller files, it is often a requirement to insert the header element into each
of the split files.
The header element in the file below is TransportHeader.
ImportData.xml
The objective is to split this file into five files each containing the header element and one of
the "R" elements.
The first split file created is shown below. The other four split files created differ only in the
"R" element that appears in the file.
test1.xml
It is not uncommon that the content to be split is within a descendant element and content
prior to it should be skipped. In our sample file, the content to be split are the "R" elements. With the
exception of the header element, the elements before the first "R" element should be skipped.
XmlSplit has command line arguments for handling all of these requirements. The XmlSplit Script Wizard was used
to automatically generate the script by selecting the necessary arguments and setting their values with a simple
dialog.
The script below uses the /H (Header) argument to write the header element to each split file. Each split file
is to contain one row item, the R element, in addtion to the header element. The /S (Split method) argument is
set to 1, meaning the first split method, splitNthElement, is to be used and the /F (Frequency) argument is
set to 1 to split after each R element. Since the R elements
are a depth of 5, the /D (Depth) argument is set to 5. In order to exclude all nodes up to the first R element, the /T
(Threshold)
argument is set to /T=R which tells XmlSplit to skip all nodes until an element
named "R" is reached. The /R (Root) argument is set to the
name of the root element in ImportData.xml and is used to encapsulate each split file so that each is a well-formed
XML document.
Submitted by Bill Conniff, Founder of Xponent, on April 23, 2012