Introduction

XmlSplit consists of two programs for splitting large XML files into smaller, well-formed XML files.

The source XML document must be a well-formed XML file (correct XML syntax, element nesting, etc.) based on the XML 1.0 specification of the W3C.

XmlSplit Script Wizard

XML Viewer

The Wizard has a built-in XML file viewer which may be hidden by clicking the "x" above it. The purpose of the Viewer is to help the user identify the structure of the XML in order to select the appropriate split method. Methods which have element depth and name options can drag an element from the Viewer into the element name textbox and the depth is automatically filled in the depth text box. The Viewer is read-only and does not have the editing and navigation features of our companion product, XMLMax editor.

It is not necessary to load the entire XML file before splitting the file. The Viewer initially reads approximately 10000 nodes into the display area. In the licensed version, the entire file may be navigated but the program is ready to split at any time. Dragging the scroll bar to the bottom or pressing Ctrl +End for a file with more nodes results in a prompt to access the entire file. Confirmation results in the entire file being read with an XML parser in a background process which may be cancelled at any time by pressing the Escape key. The green square at the bottom of the window changes to red during the background process and back to green upon completion.

The .NET tab control in the Wizard does not handle navigation via the tab key. The following keyboard shortcuts have been added to navigate between the tabs: Alt+F =files, Alt+M=Methods,Alt+O=Options, Alt+C=Script, Alt+E=Merge. Also added a shortcut for the viewer: Alt+X=XML.

In the trial version, the Viewer is limited to the initial 10k nodes, but will split the entire XML file.

Using The Wizard

1. Select the Required files.

2. Choose the Method for splitting the XML file.

3. Select the Options to choose desired options.

4. Choose the Split tab to split the XML to disk files or preview them in a window.

5. Select the Script tab to create scripts that call the xmlsplit.exe console program:

6. Choose the Merge tab to merge multiple XML files into a single XML file.

Required Files
There are two required file inputs: the XML file to be split and the path of the split files. The specified output name is used as a base name for creating auto-numbered split files which are written to. They are named by inserting a one-based counter to the end of the file name. For example, if the default output file split.xml is used, the split files are split1.xml, split2.xml, split3.xml...

If the XML file is a USPTO patent application file containing multiple XML fragments, special handling is provided for splitting it. These files are not well-formed XML because the Xmldeclaration node that begins with "?xml" occurs multiple times. These files have a few requirements which are used to verify they are USPTO patent application files: the file name must start with ipa or pa, followed by 6 digits, middle 2 must be month value, last 2 day value. If verified, the file will open in the viewer even though it is not a proper XML file. There is no need to choose a split method or select any of the options becuase the file will be split at the XmlDeclaration node that marks the beginning of each XML fragment. This feature is currently available in the Wizard only; it is not supported by the console program xmlsplit.exe.

Split Methods

Split method descriptions: Miscellaneous Methods.

The following methods split at depth = 1 only. They ignore XML nodes prior to the first element with depth =1; to write those nodes at the start of each split file, use the Include file option. You may need to use the Root element option if the split files are not well formed XML. The miscellaneous methods do not support either the Preserve Structure or Header element options.

Options Tab

Default settings in the Options tab represent those that work best based on the chosen Method. Options that are not enabled or are unchecked, are not applied during the splitting process
Split Tab

Splitting the XML file has two options: write the split files to disk or preview them in a window.

Split Results Summary:When the XML is split to disk files, they are listed in a summary. Double-clicking a file in the summary loads it into a text viewer. The save button enables the summary to be saved to disk in list format or .csv format. If the inputs checkbox is checked, the saved summary includes a header section with the user inputs from the Files, Methods and Options tabs. If a split is cancelled before it finishes, the summary file list from the previous split is cleared, but the split files are never deleted from disk.

Script Tab

Creates scripts for executing the xmlsplit.exe console program in the selected scripting language, Windows Script Host(WSH) or Powershell(PS). They may be run from this tab in a Windows command window or Powershell window. Scripts are in an editable textbox to facilitate user modifications, such as removing the echo statement.

The save dialog defaults to the .vbs file extension for WSH scripts and .ps1 to Powershell scripts and these files may be run directly from Windows File Explorer.

The WSH commandline script may not be saved to a file because it does not include the WSH scripting object needed to run it from a file.

Windows Script Host and Powershell are Microsoft products which must be installed and configured on your computer.

When done, the exit code returned by the xmlsplit.exe program is displayed. Zero means the operation was successful.

The command to use in a Windows command batch file should be identical to the WSH command-line script; enclose the full path for xmlsplit.exe in quotes unless it has no spaces and is a short file name.

Note on older versions: when a script is created, the commandline argument for the selected method is a number. The order in which the methods are listed in the Methods tab was changed in recent versions but for backward compatibility with older scripts, the commandline argument numbers are unchanged. For example, the split size method argument is 6.

If changes are made in the Files, Methods or Options tabs after a script has been created, click the Create button again to generate a new script with the changes.

Merge

Merge multiple XML files into a single, well-formed XML file.

Performs a simple merge that concatentates the selected XML files in a manner that results in a well- formed XML output file. This is accomplished by reading each file and writing to the output file only the content within the root element of each file. The root elements, and any nodes that occur before the root, are not written. The only exception is if the first selected file has a DocumentType node [DOCTYPE] it is written. This is done in case the XML content contains entity references declared in the Document Type node. An input box is provided for a custom root element which is used applied to the output file. If it is left empty, the root element in the first selected file is applied.

Arguments For the Command Line splitter xmlsplit.exe

Each argument may optionally begin with a single forward slash(new in version 1.3.4). Othewise, the first character of each argument is a single letter(it may be upper or lower case) denoting the argument, followed by the equals sign, followed by a value. A single space separates each argument. Spaces within each argument are not allowed, with the exception of file names, which must be enclosed in double quotes, and the /r argument if attributes are included in the root element. The arguments may appear in any order. Note that arguments specifying element or attribute names are case-sensitive because XML is case-sensitive.

If XmlSplit completes the operation successfully, an exit code of Zero is returned. If an error occurs, a positive integer is returned. The meaning of the exit codes is listed in the Exit Codes section.

/? Displays a brief definition of all the arguments in the console window. If followed by other arguments, the script is not executed but the value of all arguments on the command line are displayed in the console window which may be helpful in diagnosing problems with a script.

/a=attribute name example: /a=ID specifies an attribute named "ID"

Applies only to the third split method. The XML file is read and when an element having an attribute with the specified name and at the specified depth occurs, its value is stored. All nodes and content are written to the output file until an element is read that has the specified attribute at the specified depth and the value of the attribute is different from the stored attribute value. At that point, the output file is closed and the next output file is started.

/b=Write Byte Order Mark example: /b=1 specifies that no Byte Order Mark(BOM) is to be written.

Determines if the split files will have a BOM written in the header of the file. The BOM is used by XML parsers and other computer programs to determine the encoding. If the encoding is UTF-16 and no BOM is written to the split files, some XML programs, including XMLMax, will not be able to read the split files. The default value is 1. If the b parameter is omitted, no BOM is written. Note that no BOM is written for iso-8859-1 files regardless of which value is specified as per convention. A BOM is always written for UTF-16 big endian because without it many programs will assume little endian since it is more common in MS Windows files.

1: No BOM is written
2: A BOM is written for UTF-8 and UTF-16 encoded files.

/c=encoding example: /c=2 specifies Unicode encoding.

optional encoding to use when creating the output files. If this argument is missing or is not a valid value, files are written in the encoding of the source XML document provided it is one of the encoding listed below. Note that if iso-8859-1 is specified, an XmlDeclaration node is automatically inserted at the beginning of each split file with the encoding attribute set to iso-8859-1 because that is the only way most XML parsers are able to distinguish iso-8859-1 from utf-8. The exception is if an include file(/I argument) is specifed and the include file contains an XmlDeclaration in which case it is up to the user to ensure that the XmlDeclaration contains the appropriate value for the encoding attribute. Valid values are:

1: UTF-8
2: UTF-16
3: iso-8859-1
4: UTF-16 big-endian

/d=depth example: /d=2 specifies a depth of two

Required. Depth in the XML hierarchy. It must be a positive integer greater than zero. The root element always has a depth of zero. The default value is one because that is the value most commonly used. A value greater than one is generally used in special circumstances and caution should be exercised because it may produce unexpected results, including split files that are not proper XML files (not well-formed).

/e=element name example: /e=author

Optional. Applies to split methods 1. If specified, a new split file is created when the name of the element being read is identical to the name specified by this argument and all other other criteria for the split method are met.

/f=frequency example: /f=1000 specifes every 1000th element is written to a separate file.

required for split method number one only. The XML file is read and all nodes and content are written to the output file. Elements at the specified depth are counted until the frequency value is reached. At that point, the output file is closed and the next output file is started.

/g=search string example: /g=book

Optional. Applies to split methods 5. A new split file is created when the node types specifed in the /n argument occur and the value of the node contains the text specified by this argument.

/h=header element example: /h=true

Optional. Applies to all split methods. The term "header element" is used here to refer to the first element under the root element. If this argument is used, the header element, including all of its descendant nodes, is written to each of the split files. Note that the value provided for this argument can be any string and does not have to be the name of the header element. "true" is used in the above example.

/i=include file. example: /i="c:\temp\includethisfile.xml"

Optional. Identifies a file to insert in each output file.

If the root parameter is not specified, the include file is inserted at the beginning of each split file.

If the root parameter is specified, the insertion point of the include file is determined by the occurence of an XmlDeclaration in the include file:
if it starts with an XmlDeclaration the include file is inserted at the beginning of each split file;
otherwise it is inserted immediately after the start tag of the root element.

The include file may begin with an XmlDeclaration node and may contain comments and other node types. It may be used in conjunction with an append file. It does not have to be a well-formed XML file, but each node in it must have correct XML syntax.

/j=Preserve Structure example: /j=true

Optional. Applies to split methods A. This option ensures that all split files have the identical structure as the source XML document from the start of the file up to the first element that matches the split criteria. Each node that occurs in the source XML prior to the first element that matches the split criteria is written to the start of each split file. If any of these nodes are open elements, their corresponding end tags are written to the end of each split file to ensure that all elements are closed and properly nested.

Note: this option is not compatible with the following options: root element, header element, include file, append file, WriteDTD. These options are ignored if the Preserve Structure option is specified.

/n=node list example: /n=comment,cdata,processinginstruction

Optional. Applies to split method 5. This argument must be a comma delimited list containing any of the following node types: comment, cdata or processinginstruction. A new split file is created when any of the node types listed in this argument occur and the depth in the XML hierarchy is one.

/o=output file. example o="c:\temp\output.xml"

Required. A fully qualified file name. It must be enclosed in double quotes. Auto-incremented file names are created using this file name by inserting a number at the last position before the extenstion, for example, "c:\temp\output1.xml", "c:\temp\output2.xml", etc. The output file does not need to be an existing file, but XmlSplit must have permissions needed to create files in the specified folder.

/p=append file. example: /p="c:\temp\filetoappend.xml"

Optional. Identifies a file to append to each output file. It may be used in conjunction with an append file. It does not have to be a well-formed XML file, but each node in it must have correct XML syntax.

/r=root element. example /r=rootelement"

Optional. A string comprising a properly named XML element. The specified root is used to create the root node for each split file. Note that by default the root element of the source XML file is not written to any of the split files. Therefore, the /r argument assures that each split file has a document root. Markup characters(XML angle brackets) must not be included and will result in an error if they are included. XmlSplit inserts the required markup when the output files are written. Attributes are permitted. Alternatively, a root may be written to each split file by using an include file (/i argument) with the root start tag, along with any attributes, and an append file(/p argument) with the root end tag. However, do not use this method and the /r agrument or the split files will have two roots and will not be well-formed XML.

This argument should not be enclosed in quotes with the following exception:
If attributes, such as a namespace, are included with a root in the /r argument, the entire argument must be enclosed in double quotes with single quotes around each attribute value, as in this example:
/r="book xmlns:f='http://www.xponentsoftware.com/xslt/fragmentation'"

If the attributes are enclosed in double quotes, XmlSplit will fail and return an error value for the exit code. The Wizard will automatically replace double quotes for attributes with single quotes and enclose the entire root in quotes as needed when it generates the script. The Wizard also has a command button that inserts the entire root start tag including any attributes it has into the root node textbox in the dialog. Note that when you use this button, if the attributes have double quotes they are not replaced until you click the Generate button.

/s=split method example: /s=1 or /s=1 specifies method number one.

The split method must be an integer in the range 1-5 as defined below:
1: split every nth element at depth equal to the depth argument
2: split when an element's tag name changes at depth equal to the depth argument
3: split when value of the attribute argument changes in an element at depth equal to the depth argument.
4. split when the namespace in score changes, optionally in the specified element name and at the specified depth.
5. split when any of the node types specified by the /n argument occurs at a depth of one and optionally containing the text specified in argument g.
6. split when specified number of bytes are written and optional element name occurs at specified depth. Deprecated: see Arguemnt Z.

/t=threshold element. example: /t=book

Optional. A named XML element at which to begin processing. It is used to specify the first element to split at.

/w=write DOCTYPE node. example: /w=1

Optional. An integer from 1 to 3:
If an XmlDocumentType node occurs in the source XML document, this argument controls if and when it is written. The following lists the possible values and their meaning.
1: write DOCTYPE to every split file
2: write DOCTYPE to first split file only
3: write DOCTYPE to none of the split files

The wizard has a drop down combo box that lists the above options. It is disabled if the XML file does not have a DOCTYPE node, in which case the script generated by the Wizard will not have the /w argument.

/x=Xml File example: /x="c:\temp\myXML.xml""

specifies the Xml file to be split. The file name must be enclosed in double quotes if it contains a space..

/z=Split File Size example: /z=5000000

Used with split method 6. The Xml file is split after the specified number of bytes have been written. Note: split method 6 was removed from the Wizard Methods Tab in version3100 and since then

the Wizard does not create scripts that include it. For backward compatibiltiy, recent versions of xmlsplit.exe do support this argument and the method argument S=6.

Exit Codes

Upon completion of execution, the console program XmlSplit.exe returns one of the following exit codes. The program will return an exit code related to Windows Registry access failure if run without sufficent Windows permission. If the XmlSplit.exe exit code is not zero, XmlSplit.exe writes a log file with the exit code and all commandline arguments to xmlsplit error log.txt in the user's AppData\local\xponent folder.

If run from a Powershell script, the exit code is the exit code from the script which may not be the exit code from XmlSplit.exe. If the script is run from the Wizard, the Wizard is able to capture the exit code from XmlSplit.exe and display it.

0: operation was successful.

1: xml source document file was not found or was omitted.

2: error reading xml file, most likely due to it not being well-formed.

3. "s" argument for the splt option is not in the valid range or was omitted.

4. "f" argument for frequency is not a positive integer or is required for the specified option and omitted.

5. "d" argument for xml depth is not a positive integer or is required for the specified option and omitted.

6. "a" argument for an attribute name; if required for the specified option and is missing or is not a valid xml name.

7. "e" argument for an element name; if required for the specified option and is missing or is not a valid xml name.

8. "0" argument for the output file name is not a valid file name or is required for the specified option and omitted.

9. "r" argument for a root element name is not a valid xml name.

10. A file operation failed, most likely while attempting to create a split file based on a bad output file name parameter. It could also be an invalid path or insufficent permission for the program to create the file.

11. An internal error(bug). Please report this to the vendor.

12. File specified by the "i" argument for an include file was not found or could not be opened for reading.

13. Error reading registry. Possibly due to insufficent Windows permission. Try Run as adminstrator.

14. The registration key stored in the registry is not a valid key for XmlSplit.

15. Registration failed because the license key is not valid or the user does not have permission to read it from the Registry. Run this application with elevated permission (try Run as administrator).

16. The size of the XML file exceeds the size limit for unregistered versions of the program.

17. Execution ended because trial period has expired.

18. File specified by the "p" argument for an Append file was not found or could not be opened for reading.

19. Argument is not properly formatted, e.g., missing equal sign..

20. The /n argument is missing or includes a node type not allowed in this argument...

21. Online registration of the key used with the "k" argument failed. This may occur for a number of reasons. You must be logged in as the same user that installed XmlSplit. Please email the vendor at support@xponentsoftware.com and include your registration key.

22. The registration code used with the "k" argument was already registered. Please email vendor why you are activating again: support@xponentsoftware.com

23. The program was denied access to the Windows Registry, most likely due to insufficent Windows Permsission. Try Run as adminstrator.

24. Registration failed because the user does not have permission to access the Windows Registry.Try Run as adminstrator.

25. Registration failed because one of the Windows Registry keys created during installation of XmlSplit was not found..

26. Online registration failed because no internet connection or access to Xponent website blocked by firewall..

27. Online registration failed because the Xponent webserver was not available or was unable to process the request. Please notify support@xponentsoftware.com and include your registration key.

29. Program exited for undetermined reason, possibly an internal bug.

30. License key must be verified because a newer version has been installed. Use the XmlSplit Wizard to re-register.

32. A file or folder error occured. Possibly the program was unable to read or write to a file in the Windows Common Applications Data Folder or in the user's profile folder.

33. The maximum number of licensed users was exceeded. If running the program on a server, each user is required to have one license. If the program exits abnormally, due to a power failure, for example, the user count is not decremented which can result in this error. The user count may be reset to zero by running the Wizard with elevated permission and while no other users are running either the Wizard or the XmlSplit.exe console application. Simply run the Wizard in this manner and then exit to reset the count.

34 and 35. A non-valid license was found in the Registry in the location for this program .

36. Unable to lock the license file which prevented the program from writing to it. This can result in an incorrect number of concurrent users, possibly exceeding the numbe of licenses and thereby preventing additional users from using the program.

37 and 38 Unable to read or update data in the license file, likely due to missing XML element or XML syntax error in the file.

39. License file not found.

40. Missing or incorrect data in Activation Code File.

41. Version number in Activation Code File does not match the installed Application. .

42. Activation Code File not found; it must be in Winows Local Application Data folder.