Xponent logo Xponent
Specialists In Large XML Documents Privacy Policy  Contact

Xponent's Mostly XML Blog


Editing Large XML Files

If you have a need to modify a large XML file you may have difficulty finding a suitable editor, especially if the XML file is in the gigabyte size range.

Large XML files in the multi-gigabyte range are more common than one might think. Recent Google search results for "Edit large XML file" included several listings where the poster asked what tools can be used to edit XML files in the gigabyte range.

The largest single XML file I know of is an Open Street Map file. The one I tried is over 270 GB uncompressed. The compressed file is 29 GB, available at http://download.geofabrik.de/europe-latest.osm.bz2. While files this large are not so common, XML files in the .5 MB to 2 GB are common and we have many customers with files in that range.

When it comes to editing large XML files, the need is usually limited to very minor changes rather than authoring the XML from scratch. It may be tempting to find a free or inexpensive text editor for the task. It may be difficult to justify the expense of a feature-rich XML editor just to make one or two small changes.

Here are few important considerations when choosing a tool for editing large XML files.

Text Editors

Some text editors offer limited support for XML files, such as syntax coloring of the XML markup, content, and node types. Some also enable collapse and expansion of XML elements. I know of none where these features work for really large files in the multi-gigabyte range. Most load the entire file into memory so available memory is a limiting factor on the size of file that can be processed.

Text editors typically do not hide XML markup characters(brackets, slashes, question mark and exclamation mark used to mark the start and end of XML nodes). If these characters are in the editing window you must be careful not to modify them or even insert any character where it is not allowed because XML is very strict with respect to syntax and the use of markup characters. This can be an even larger problem if the markup characters are entitized. For example, the entity representation for the left angle bracket is "<").

Entitized XML is not very readable:

entitites

It is even worse if the XML has no whitespace, as shown below, which is common with XML that is sent over the Internet.

entitites

XML Editors.

Many XML editors claim to support large XML files, but some features may be limited or not available. Most XML editors load the entire XML into memory using some form of the Document Object Model (DOM). Only when the entire XML tree is in memory are they able to offer a rich set of editing features such as shema-based editing. If the editor handles large XML by loading only part of the file at one  time, the editor very likely cannot support such features.

Some XML editors switch to using a text editor when available memory either prohibits loading the XML into a DOM object or would result in poor performance. You are then faced with the issues pertaining to text editors discussed above.

Issues With Saving large XML.

Does the editor have to make a copy of the XML first? If the XML is really huge this could be a problem. Some XML editors claim to support terrabyte size files, but if it has to make a copy of it first that would be expensive in time and disk space. How long does it take to save a large file? Most XML editors use the built-in XML objects of the programming language when saving XML files, but that is slower than writing byte arrays directly to disk.

Issues With Loading Large XML.

How does the editor actually handle large XML? If it loads the entire file into memory, then performance will be poor and there will be a limit on the size of file that can be loaded. If it loads only a small chunk at one time, what happens if you delete an element with many child elements that span multiple chunks? Will it detect the problem and cancel the operation or, worse, save the file with missing child elements?

Issues With Navigating Large XML.

How easy is it to navigate large XML? Is element collapse and expansion supported?  What happens if you collapse an element with thousands, or even millions, of child elements? How are search results displayed? Is search and replace supported?

Single line files.

Does the editor have difficulty displaying large XML if there are no line feeds in the file? The XML may not have line feeds and other non-significant whitespace to reduce file size. When XML is transmitted over the Internet it typically does not have line feeds. Some editors, and particulary text editors, cannot place each element on a separate line if the XML does not have line feeds.



Submitted by Bill Conniff, Founder of Xponent, on December 22, 2014



Copyright Ⓒ 2008-2023. Xponent LLC. All rights reserved.