Xponent Software White Paper

Conquering Memory and Performance Problems With Large XML Treeviews

May 26, 2009

Xponent developed a radically new type of treeview that completely eliminates the memory and performance problems that result from displaying a treeview with a large number of nodes. The architecture does not implement or build upon any existing treeview technology, including ones that do not hold the entire XML in memory. Conventional treeview controls hold the entire treeview in memory, such that the number of nodes that may be added is strictly a function of the amount of available memory. As nodes are added and available memory decreases, so does performance.

For developers reading this, we do not use a pre-built treeview control. Our control is a class that inherits from the Panel class and adds to the panel a textbox for each XML node. So what appears to be a treeview is actually just a bunch of textboxes with plus and minus sign images for expanding and collapsing elements. The number of textboxes added is very small and is the number that are visible in the panel at its current size, typically twenty-five to fifty textboxes. Resizing the control by dragging the bottom frame down but not releasing it reveals there is no additional XML content -it is not filled in with more XML nodes until the frame is released and the resize event is completed.

Existing Solutions

A widely used strategy, dynamic loading, has been around for at least ten years. It uses a conventional treeview control, but loads only the nodes in the first hierarchy under the root(depth=1), but none of their child nodes, which are dynamically loaded and unloaded upon demand when a parent node is expanded or collapsed. A problem with this design occurs if an element has a large number of child nodes, because all of the child nodes have to be read to ensure that each loaded element includes its end tag. This can significantly degrade performance. Similarly, expansion can take a long time to read and load all the child nodes. As a demonstration, download the simple.xml file in zip format(1.5MB zipped, 21 MB XML) and try loading it into a treeview with whatever XML software you have. Simple.xml has just one child node under the root, but it has one million elements(grandchildren). Check out the data files from stackoverflow.com referenced on our Links page. These files have tens of millions of elements all at the first level under the root. XMLMax is the only XML editor that can open files this large.

Dynamic loading requires that the displayed elements are initially collapsed since child nodes are not loaded. This may detract from the usability of the treeview for some users. It also presents the problem of preserving the collapsed-expanded state of an element; when a user collapses an element, all its child nodes are removed from the tree and the expanded state of its descendant elements is lost. Thus, when the element is expanded again all its child nodes are collapsed, even if some or all of them had ben left expanded.

Notepad and some other text editors display XML in a collapsable tree. They do not use dynamic loading but also do not use an XML parser so they cannot detect if the file has an XML syntax error. Given a big enough XML file text editors will not use a treeview. Notepad++ took over one minute to read and fully load simple.xml, while XMLMax took just three seconds.

The Ultimate Treeview

Xponent's treeview works by adding only the number of nodes it takes to fill the treeview panel, displaying them fully expanded. It can display any fragment of an XML document even if the fragment contains unbalanced elements, i.e., not all the elements have matching start tags and end tags. It does not have to read the XML until it reaches the end of a balanced fragment. Thus, if the treeview window displays twenty-five XML nodes, then twenty-five nodes are read from the XML and added to the tree which is then ready for user interaction.

When an XML file is first opened, the entire file is parsed(with the .Net XmlReader). The user is able to navigate the treeview for the portion of the XML that has been read while the remainder of the file is read in the background(asynchronously). During this initial parsing, the file is segmented at arbitrary elements(the segments do not need to be balanced as mentioned above) and only one segment is held in memory.

Because so few nodes are in the treeview at a given time, the amount of memory needed to display and navigate any XML is strictly a factor of the content of the XML. Approximately ten thousand XML nodes are held in a memory buffer. Thus, the memory requirement is the size in bytes of the XML contained in the buffer, plus a small amount of overhead. It would be unusual for the memory requirement to exceed twenty megabytes unless the XML content includes very large text nodes. Each time the tree is navigated, such as page down, page up, each textbox is updated with the next set of nodes. This completely eliminates both the memory and performance problems, but it introduces a number of new ones.

Even with such a custom treeview, Xponent had to resolve a number of issues:

Locating the child nodes of an element when it is expanded or collapsed.
Tracking the expansion state of each element such that it is re-displayed in its prior state.
Tracking the hierarchical depth of each node so that it may be properly indented in the treeview.
Fast, bi-directional paging and scrolling from any location in the XML.

XML data present the addtional problem of determining when XML nodes need to be added to the treeview in response to user navigation. To address these issues, Xponent designed a buffering and caching system based on accessing and tracking byte offsets into the raw XML. The only drawbacks of the design are:

There is no smooth scrolling while the scrollbar is dragged.
This is because the treeview contains only the number of XML nodes that fit on one page. In the case of a scroll drag event, it cannot be re-populated until the drag event is completed.
There may be a brief delay during user navigation when the end of a buffer is reached.
This is because the buffer must be cleared and re-loaded. The delay ranges from a small fraction of one second to two seconds. The delay for a given datasource is constant and does not vary with how far the treeview is scrolled. Thus, dragging the thumb from the beginng of the tree to the end should not take more than two seconds. Paging and scrolling within the range of the current buffer takes a small fraction one second so there is no noticeable delay. Buffer size ranges from ten to fifty thousand nodes.