Programmer to ProgrammerTM | |||||
|
|
|
|
|
|
|
|
|
|
|
| |||||||||||||||||||
The ASPToday
Article October 24, 2000 |
Previous
article - October 23, 2000 |
Next
article - October 25, 2000 | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||
ABSTRACT |
| ||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||
Article Discussion | Rate this article | Related Links | Index Entries | ||||||||
ARTICLE |
XML documents arrange data in hierarchical or tree like form, and are accessible through two popular ways: one being the Document Object Model or DOM, a W3C standard, which provides various objects that represent various parts of an XML document like the Root, Node and Attributes. The other way, SAX or Simple API for XML, is perhaps less well known, and is the focus of this article. SAX is an open standard originally implemented primarily in Java but now Microsoft has a COM implementation of SAX 2.0 which can be used from within any COM compliant language like Visual Basic or C++. SAX provides an event–based model to developers.
To work with examples illustrated in this article – and with Microsoft's implementation of SAX 2.0 in general – you need Microsoft XML Version 3 ( MSXML3.dll ) properly installed on your machine. This is available for download from Microsoft's web site. You will also need a COM compliant development tool like Visual Basic (here we will focus our attention to VB only). Finally, you will need a text editor to create sample XML documents.
Typically any XML document would be parsed using one of two ways: DOM or SAX. DOM arranges XML document in a tree like fashion in the memory. The tree can be navigated from root to child nodes. You can access any node randomly to manipulate its value. As mentioned earlier the DOM provides various objects representing root, nodes, attributes etc. of XML document. To get an idea of how the DOM can be used let us consider the following code fragment written in Visual Basic:
Dim mydoc As New MSXML2.DOMDocument Dim oNode As MSXML2.IXMLDOMNode mydoc.Load (App.Path + "\" + "catalog.xml") For Each oNode In mydoc.childNodes MsgBox "Node Text :" + oNode.Text MsgBox "Node Type :" & oNode.nodeType Next
Here, the XML document is loaded from the disk using the DOMDocument object's load method. Then each node of the document can be accessed using the childNodes collection. The Node object has properties and methods which provide more information like type of node, text etc. about the node. You can access any node randomly from the collection. You can also perform activities like changing node text, adding new elements, removing existing elements etc.
Now, let us consider SAX approach. You can parse the document and be notified when various elements and symbols forming the document are found. This event–based interface is adopted by SAX. In this case it is the parser that tells you about certain events. When the SAX2 Parser processes the XML document, it generates a sequence of events like StartDocument() , StartElement() , StartElement() , Characters() , EndElement() , EndElement() , EndDocument() . Here, you need to provide functionality that will be invoked as and when events occur. Here is a small code fragment which illustrating the use of SAX:
Dim myReader As MSXML2.VBSAXXMLReader // classes you will develop which will be implementing // corrosponding interfaces from MSXML Dim myContentHandler As mycontenthandlerclass Dim myErrorHandler As myerrorhandlerclass Set myContentHandler = New mycontenthandlerclass Set myErrorHandler = New myerrorhandlerclass // now provide parser your objects Set myReader = New MSXML2.VBSAXXMLReader Set myReader.contentHandler = myContentHandler Set myReader.errorHandler = myErrorHandler // parser will parse the document and call methods of objects provided by you //like StartElement(), StartDocument(),EndElement() and EndDocument() as and // when it encounters various elements. myReader.parseURL (App.Path + "\" + "catalog.xml")
Note that SAX does not provide a way to change the document – it provides just read–only access to the document.
As stated earlier Microsoft has implemented SAX as COM compliant interfaces. These interfaces are available for any COM compliant tool like Visual Basic or C++. Here, we will deal with VB interfaces only. The Microsoft XML Version 3 provides in all 10 SAX interfaces to the developers. Some of the important interfaces are as follows:
Before actually starting any coding let us examine these important interfaces in more detail.
As stated earlier this interface deals with actual content of XML document i.e. various elements, character data etc.
The main methods we are interested in are:
Startdocument method: This method is called once at the start of the document and it is the first method that gets called in the ContentHandler interface. The method signature is as follows:
Private Sub IVBSAXContentHandler_startDocument() End Sub
Enddocument method: This method gets called once the parser reaches end of the document. The method signature is as follows:
Private Sub IVBSAXContentHandler_endDocument() End Sub
StartElement : This method gets called at the beginning of each element. It provides a list of attributes contained within the start element. The syntax of the method is as follows:
Private Sub IVBSAXContentHandler_startElement(ByVal strNamespaceURI As String, _ ByVal strLocalName As String, ByVal strQName As String, _ ByVal oAttributes As MSXML2.IVBSAXAttributes) End Sub
EndElement : This method is fired when the parser encounters the end of an element. Its syntax is as follows:
Private Sub IVBSAXContentHandler_endElement(ByVal strNamespaceURI As String, _ ByVal strLocalName As String, ByVal strQName As String) End Sub
Characters : This method receives character data from the XML document. The character method is called between the startElement and endElement methods. Its signature is as follows:
Private Sub IVBSAXContentHandler_characters(ByVal strChars As String) End Sub
This interfaces provides a way to track errors occurred during the parsing operation. The errors can be of three types:
Presently, all the errors are treated as fatal errors by the MSXML parser. The method signatures of above methods are as follows:
Private Sub IVBSAXErrorHandler_error(ByVal oLocator As MSXML2.IVBSAXLocator, _ ByVal strError As String, ByVal nErrorCode As Long) End Sub Private Sub IVBSAXErrorHandler_fatalError(ByVal oLocator As MSXML2.IVBSAXLocator, _ ByVal strError As String, ByVal nErrorCode As Long) MsgBox "Fatal Error" End Sub Private Sub IVBSAXErrorHandler_warning(ByVal oLocator As MSXML2.IVBSAXLocator, _ ByVal strError As String, ByVal nErrorCode As Long) End Sub
Here, the locator object provides information about the column number and line number within the XML document where the error occurred.
This interface provides methods which give information about an attribute. The interface is implemented by the parser itself. Using this interface we can read attributes marked with #DEFAULT and whose values are explicitly set. However, attributes declared with #IMPLIED but having no value set in the start tag will not be available. The important methods and properties of this interface are as follows:
This interface is implemented by the parser and provides document loading and parsing capabilities. This interface has several properties which directly correspond to other interfaces you implement. You must set these properties to objects of your classes before parsing the document. Important properties and methods of XMLReader are as follows:
Now, let us try out these methods and properties. To get a feel of how DOM and SAX are different we will first develop the application using DOM and then using SAX. For our example we will take a simple XML document which is stored as catalog.xml :
<?xml version="1.0"?> <catalog> <book price="150" category="programming" isbn="1111111111"> <author>Author 1</author> <subject>Title 1</subject> </book> <book price="200" category="programming" isbn="2222222222"> <author>Author 2</author> <subject>Title 2</subject> </book> <book price="99" category="engineering" isbn="3333333333"> <author>Author 3</author> <subject>Title 3</subject> </book> </catalog>
We will perform the following operations on the XML document:
Now let us examine the code that uses DOM for the calculation of the total price of books from the programming category:
Dim mydoc As MSXML2.DOMDocument Dim oAttb As MSXML2.IXMLDOMNamedNodeMap Dim oNodes As MSXML2.IXMLDOMNodeList Dim temp As Object Dim Price As Currency Set mydoc = New MSXML2.DOMDocument mydoc.Load "catalog.xml" Set oNodes = mydoc.getElementsByTagName("book") For Each temp In oNodes Set oAttb = temp.Attributes If oAttb.getNamedItem("catagory").nodeValue = "prog" Then Price = Price + oAttb.getNamedItem("price").nodeValue MsgBox " Book Title :" & temp.childNodes(0).childNodes(0).nodeValue & _ vbcrlf & "Author :" & temp.childNodes(1).childNodes(0).nodeValue End If Next MsgBox "Total price of programming books is " & "$" & Price
Here, our sample XML document has only a few records but in reality it may contain thousands of records. When we use DOM the whole XML document will be loaded in memory. This is unnecessary as we don't need the entire document at once. Also, we don’t want to access nodes at random.
Now, let us turn our attention to SAX. We will first look at the general steps required to program SAX interfaces.
Programming SAX interfaces through Visual Basic involves following steps:
In our sample project mycontenthandler class implements the IVBSAXContentHandler interface while myerrorhandler class implements the IVBSAXErrorHandler interface.
Note that since we are implementing interfaces in VB you must add empty method definitions even though you are not interested in coding those methods.
We will perform the following tasks on the document:
Here is the code to count the price of books in the desired category and display title–authors:
Dim m_Reader As MSXML2.VBSAXXMLReader Dim m_ContentHandler As mycontenthandler Dim m_ErrorHandler As myerrorhandler Set m_ContentHandler = New mycontenthandler Set m_ErrorHandler = New myerrorhandler m_ContentHandler.Catagory = "prog" Set m_Reader = New MSXML2.VBSAXXMLReader Set m_Reader.contentHandler = m_ContentHandler Set m_Reader.errorHandler = m_ErrorHandler m_Reader.parseURL ("catalog.xml") MsgBox "Total price of " & m_ContentHandler.Catagory & " books is " & _ "$" & m_ContentHandler.Price
Here, we have created an object of XMLReader . Then we assign contenthandler and errorhandler properties to the corresponding objects of our implemented classes. Finally we start parsing the XML file using the parseURL method. Unlike DOM, the SAX parser need not load the entire document at a time and hence with huge documents the processing will be faster and memory efficient.
The main screen looks like this:
As stated earlier we need to implement the IVBSAXContentHAndler interface. The entire class module looks like this:
Option Explicit Implements IVBSAXContentHandler Private m_count As Integer Private m_price As Currency Private m_catagory As String Private cur_ele As String Private isReqdCatagory As Boolean Private Sub IVBSAXContentHandler_characters(ByVal strChars As String) If strChars <> vbCrLf + vbTab Then If isReqdCatagory = True Then Select Case cur_ele Case "title" MsgBox "Author :" & strChars Case "author" MsgBox "Title :" & strChars End Select End If End If End Sub Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS _ As MSXML2.IVBSAXLocator) End Property Private Sub IVBSAXContentHandler_endDocument() End Sub Private Sub IVBSAXContentHandler_endElement(ByVal strNamespaceURI _ As String, ByVal strLocalName As String, ByVal strQName As String) If strLocalName = "book" Then End If End Sub Private Sub IVBSAXContentHandler_endPrefixMapping(ByVal strPrefix _ As String) End Sub Private Sub IVBSAXContentHandler_ignorableWhitespace(ByVal strChars _ As String) End Sub Private Sub IVBSAXContentHandler_processingInstruction(ByVal strTarget _ As String, ByVal strData As String) End Sub Private Sub IVBSAXContentHandler_skippedEntity(ByVal strName As String) End Sub Private Sub IVBSAXContentHandler_startDocument() End Sub Private Sub IVBSAXContentHandler_startElement(ByVal strNamespaceURI _ As String, ByVal strLocalName As String, ByVal strQName _ As String, ByVal oAttributes As MSXML2.IVBSAXAttributes) cur_ele = strLocalName If strLocalName = "book" Then If oAttributes.getValueFromName("", "catagory") = Catagory Then isReqdCatagory = True Price = Price + oAttributes.getValueFromName("", "price") Count = Count + 1 Else isReqdCatagory = False End If End If End Sub Private Sub IVBSAXContentHandler_startPrefixMapping(ByVal strPrefix _ As String, ByVal strURI As String) End Sub Public Property Get Count() As Variant Count = m_count End Property Public Property Let Count(ByVal vNewValue As Variant) m_count = vNewValue End Property Public Property Get Price() As Variant Price = m_price End Property Public Property Let Price(ByVal vNewValue As Variant) m_price = vNewValue End Property Public Property Get Catagory() As Variant Catagory = m_catagory End Property Public Property Let Catagory(ByVal vNewValue As Variant) m_catagory = vNewValue End Property
Note how empty methods are added since we are implementing the interface. Also, note that when SAX reads the document it also reads new line characters, tab characters, spaces etc. these will be reported by the characters method. You can filter these characters using Visual Basic string manipulation functions.
Now let us look at implementation of the IVBSAXErrorHandler interface:
Option Explicit Implements IVBSAXErrorHandler Private Sub IVBSAXErrorHandler_error(ByVal oLocator _ As MSXML2.IVBSAXLocator, ByVal strError _ As String, ByVal nErrorCode As Long) MsgBox "Recoverable Error" End Sub Private Sub IVBSAXErrorHandler_fatalError(ByVal oLocator _ As MSXML2.IVBSAXLocator, ByVal strError _ As String, ByVal nErrorCode As Long) MsgBox "Fatal Error At Col " + CStr(oLocator.columnNumber) + _ " Row " + CStr(oLocator.lineNumber) End Sub Private Sub IVBSAXErrorHandler_warning(ByVal oLocator _ As MSXML2.IVBSAXLocator, ByVal strError _ As String, ByVal nErrorCode As Long) End Sub
Even though we have implemented all the three error methods, the current version of the parser will always call a fatalError method, no matter what the actual error is. Inside the error handler methods you can use the Locator object to find the column number and line number at which the error occurred.
Finally, the code that actually loads document from the disk and parses it looks like this:
Private Sub Command1_Click() On Error Resume Next ListView1.ListItems.Clear Dim m_Reader As MSXML2.VBSAXXMLReader Dim m_ContentHandler As mycontenthandler Dim m_ErrorHandler As myerrorhandler Set m_ContentHandler = New mycontenthandler Set m_ErrorHandler = New myerrorhandler Set m_Reader = New MSXML2.VBSAXXMLReader Set m_Reader.contentHandler = m_ContentHandler Set m_Reader.errorHandler = m_ErrorHandler m_Reader.parseURL ("catalog.xml") End Sub
Here, we have created an object of XMLReader . Then we assign contenthandler and errorhandler properties to the corresponding objects our implemented classes. Finally we start parsing the XML file using the parseURL method.
Our XML document does not contain any errors. You can try removing end tags of some elements and check how fatal error is raised.
We have covered the basics of SAX and how it differs from DOM in parsing our XML documents. It has its limitations in that we cannot write to an xml document but if we want to parse a large xml document for selective data retrieval it is quicker than using DOM
|
| |||||||
|
| |||||||||||||||
|
ASPToday is brought to you by
Wrox Press (http://www.wrox.com/). Please see our terms
and conditions and privacy
policy. ASPToday is optimised for Microsoft Internet Explorer 5 browsers. Please report any website problems to webmaster@asptoday.com. Copyright © 2001 Wrox Press. All Rights Reserved. |