XBIS is an encoding format for XML documents that is fully convertible to and from text, with information set equivalence between the original document text and regenerated document text. It's intended for use in transmitting XML documents between application components, and is therefore designed for processing speed. The current Java language implementation offers several times the performance of SAX2 parsers working from text documents across a wide range of document types and sizes, and across JVMs tested, while also providing a substantial reduction in document size for most types of XML documents. XBIS is the successor to the XML Stream (XMLS) project which was developed in 2000-2001 to address the issue of processing overhead when sending XML documents between application components. The XMLS implementation was designed for use with document models, so it took a document model as input and generated a document model as output. Even though it demonstrated much better performance than alternatives such as Java serialization, or conversion to and from text, it offered no direct performance comparison to parsing XML text because of its document-model focus. The XBIS format is very similar to that used by XMLS, with only a few differences. The main difference from XMLS is in the internal details of the implementation, which has been restructured to provide better support for working with parser event streams as input and output. The current XBIS implementation uses a SAX2 event stream as input in order to generate the binary encoding of a document, and generates a SAX2 event stream as output when decoding the binary representation. Other parser APIs are likely to be supported in the future, along with direct conversion to and from document model representations of documents. Even though both XMLS and XBIS are designed around the needs of Java applications, the XBIS format is a general one that can be implemented in any general purpose programming language. The encoding format is based on long-established computing principles and has no known "intellectual property" encumberances. This format is made available to any interested parties free of restriction. The Java implementation code itself uses the commercial-friendly BSD license. The main features that distinguish XBIS from other techniques for representing XML documents are:
The current version of XBIS is designed to preserve the canonical form of text XML documents converted to and from XBIS. This represents somewhat of a change in direction from earlier versions which focused more on the XML Infoset. The reason for this change is that the Infoset is an abstraction with very little direct application, while canonicalization is a practical concern for many usages of XML (including XML Signature). This change of focus has had very little direct impact on the XBIS format and implementation. It has, however, eliminated the need to extend XBIS with support for some rarely used aspects of the Infoset. Check out the XBIS encoding format and some information on the performance tests on this site. The latest performance results are included in the article Improve XML transport performance, Part 2 at IBM developerWorks. |