Programmer to ProgrammerTM  
Wrox Press Ltd  
   
  Search ASPToday Living Book ASPToday Living Book
Index Full Text
 
ASPToday Home
 
 
Home HOME
Site Map SITE MAP
Index INDEX
Full-text search SEARCH
Forum FORUM
Feedback FEEDBACK
Advertise with us ADVERTISE
Subscribe SUBSCRIBE
Bullet LOG OFF
                         
      The ASPToday Article
October 24, 2000
      Previous article -
October 23, 2000
  Next article -
October 25, 2000
 
   
   
   
SAX 2.0 Programming With Visual Basic   Bipin Joshi  
by Bipin Joshi
 
CATEGORY:  XML/Data Transfer  
ARTICLE TYPE: Overview Reader Comments
   
    ABSTRACT  
 
Article Rating
 
   Useful
  
 42 responses

There are a number of ways you can parse an XML document - and sometimes using the DOM is


not the quickest or most efficient method. Bipin Joshi compares DOM to the new COM


implementation of SAX and discusses the advantages of both methods.

   
                   
    Article Discussion   Rate this article   Related Links   Index Entries  
   
 
    ARTICLE
SAX 2.0 Programming With Visual Basic 

XML documents arrange data in hierarchical or tree like form, and are accessible through two popular ways: one being the Document Object Model or DOM, a W3C standard, which provides various objects that represent various parts of an XML document like the Root, Node and Attributes. The other way, SAX or Simple API for XML, is perhaps less well known, and is the focus of this article. SAX is an open standard originally implemented primarily in Java but now Microsoft has a COM implementation of SAX 2.0 which can be used from within any COM compliant language like Visual Basic or C++. SAX provides an event–based model to developers.

To work with examples illustrated in this article – and with Microsoft's implementation of SAX 2.0 in general – you need Microsoft XML Version 3 ( MSXML3.dll ) properly installed on your machine. This is available for download from Microsoft's web site. You will also need a COM compliant development tool like Visual Basic (here we will focus our attention to VB only). Finally, you will need a text editor to create sample XML documents.

Difference between DOM and SAX

Typically any XML document would be parsed using one of two ways: DOM or SAX. DOM arranges XML document in a tree like fashion in the memory. The tree can be navigated from root to child nodes. You can access any node randomly to manipulate its value. As mentioned earlier the DOM provides various objects representing root, nodes, attributes etc. of XML document. To get an idea of how the DOM can be used let us consider the following code fragment written in Visual Basic:

Dim mydoc As New MSXML2.DOMDocument
Dim oNode As MSXML2.IXMLDOMNode
mydoc.Load (App.Path + "\" + "catalog.xml")
For Each oNode In mydoc.childNodes
   MsgBox "Node Text :" + oNode.Text
   MsgBox "Node Type :" & oNode.nodeType
Next

Here, the XML document is loaded from the disk using the DOMDocument object's load method. Then each node of the document can be accessed using the childNodes collection. The Node object has properties and methods which provide more information like type of node, text etc. about the node. You can access any node randomly from the collection. You can also perform activities like changing node text, adding new elements, removing existing elements etc.

Now, let us consider SAX approach. You can parse the document and be notified when various elements and symbols forming the document are found. This event–based interface is adopted by SAX. In this case it is the parser that tells you about certain events. When the SAX2 Parser processes the XML document, it generates a sequence of events like StartDocument() , StartElement() , StartElement() , Characters() , EndElement() , EndElement() , EndDocument() . Here, you need to provide functionality that will be invoked as and when events occur. Here is a small code fragment which illustrating the use of SAX:

Dim myReader As MSXML2.VBSAXXMLReader

// classes you will develop which will be implementing 
// corrosponding interfaces from MSXML
Dim myContentHandler As mycontenthandlerclass
Dim myErrorHandler As myerrorhandlerclass

Set myContentHandler = New mycontenthandlerclass
Set myErrorHandler = New myerrorhandlerclass

// now provide parser your objects
Set myReader = New MSXML2.VBSAXXMLReader
Set myReader.contentHandler = myContentHandler
Set myReader.errorHandler = myErrorHandler

// parser will parse the document and call methods of objects provided by you 
//like StartElement(), StartDocument(),EndElement() and EndDocument() as and
// when it encounters various elements.
myReader.parseURL (App.Path + "\" + "catalog.xml")

Note that SAX does not provide a way to change the document – it provides just read–only access to the document.

Advantages of SAX

Limitations of SAX

Programming for SAX through VB

As stated earlier Microsoft has implemented SAX as COM compliant interfaces. These interfaces are available for any COM compliant tool like Visual Basic or C++. Here, we will deal with VB interfaces only. The Microsoft XML Version 3 provides in all 10 SAX interfaces to the developers. Some of the important interfaces are as follows:

Before actually starting any coding let us examine these important interfaces in more detail.

The IVBSAXContentHandler Interface

As stated earlier this interface deals with actual content of XML document i.e. various elements, character data etc.

The main methods we are interested in are:

Startdocument method: This method is called once at the start of the document and it is the first method that gets called in the ContentHandler interface. The method signature is as follows:

Private Sub IVBSAXContentHandler_startDocument()

End Sub

Enddocument method: This method gets called once the parser reaches end of the document. The method signature is as follows:

Private Sub IVBSAXContentHandler_endDocument()

End Sub

StartElement : This method gets called at the beginning of each element. It provides a list of attributes contained within the start element. The syntax of the method is as follows:

Private Sub IVBSAXContentHandler_startElement(ByVal strNamespaceURI As String, _
                       ByVal strLocalName As String, ByVal strQName As String, _
                                   ByVal oAttributes As MSXML2.IVBSAXAttributes)

End Sub

EndElement : This method is fired when the parser encounters the end of an element. Its syntax is as follows:

Private Sub IVBSAXContentHandler_endElement(ByVal strNamespaceURI As String, _
                       ByVal strLocalName As String, ByVal strQName As String)

End Sub

Characters : This method receives character data from the XML document. The character method is called between the startElement and endElement methods. Its signature is as follows:

Private Sub IVBSAXContentHandler_characters(ByVal strChars As String)

End Sub

The IVBSAXErrorHandler Interface

This interfaces provides a way to track errors occurred during the parsing operation. The errors can be of three types:

Presently, all the errors are treated as fatal errors by the MSXML parser. The method signatures of above methods are as follows:

Private Sub IVBSAXErrorHandler_error(ByVal oLocator As MSXML2.IVBSAXLocator, _
                           ByVal strError As String, ByVal nErrorCode As Long)

End Sub

Private Sub IVBSAXErrorHandler_fatalError(ByVal oLocator As MSXML2.IVBSAXLocator, _
                                ByVal strError As String, ByVal nErrorCode As Long)
MsgBox "Fatal Error"

End Sub

Private Sub IVBSAXErrorHandler_warning(ByVal oLocator As MSXML2.IVBSAXLocator, _
                             ByVal strError As String, ByVal nErrorCode As Long)

End Sub

Here, the locator object provides information about the column number and line number within the XML document where the error occurred.

The IVBSAXAttributes Interface

This interface provides methods which give information about an attribute. The interface is implemented by the parser itself. Using this interface we can read attributes marked with #DEFAULT and whose values are explicitly set. However, attributes declared with #IMPLIED but having no value set in the start tag will not be available. The important methods and properties of this interface are as follows:

The IVBSAXXMLReader Interface

This interface is implemented by the parser and provides document loading and parsing capabilities. This interface has several properties which directly correspond to other interfaces you implement. You must set these properties to objects of your classes before parsing the document. Important properties and methods of XMLReader are as follows:

SAX in Action

Now, let us try out these methods and properties. To get a feel of how DOM and SAX are different we will first develop the application using DOM and then using SAX. For our example we will take a simple XML document which is stored as catalog.xml :

<?xml version="1.0"?>
<catalog>
   <book price="150" category="programming" isbn="1111111111">
      <author>Author 1</author>
      <subject>Title 1</subject>
   </book>
   <book price="200" category="programming" isbn="2222222222">
      <author>Author 2</author>
      <subject>Title 2</subject>
   </book>
   <book price="99" category="engineering" isbn="3333333333">
      <author>Author 3</author>
      <subject>Title 3</subject>
   </book>
</catalog>

We will perform the following operations on the XML document:

Now let us examine the code that uses DOM for the calculation of the total price of books from the programming category:

Dim mydoc As MSXML2.DOMDocument
Dim oAttb As MSXML2.IXMLDOMNamedNodeMap
Dim oNodes As MSXML2.IXMLDOMNodeList
Dim temp As Object
Dim Price As Currency

Set mydoc = New MSXML2.DOMDocument
mydoc.Load "catalog.xml"
    
Set oNodes = mydoc.getElementsByTagName("book")
For Each temp In oNodes
    Set oAttb = temp.Attributes
    If oAttb.getNamedItem("catagory").nodeValue = "prog" Then
        Price = Price + oAttb.getNamedItem("price").nodeValue
MsgBox " Book Title :" & temp.childNodes(0).childNodes(0).nodeValue & _
     vbcrlf &   "Author :" & temp.childNodes(1).childNodes(0).nodeValue

    End If
    Next
    MsgBox "Total price of programming books is " & "$" & Price

Here, our sample XML document has only a few records but in reality it may contain thousands of records. When we use DOM the whole XML document will be loaded in memory. This is unnecessary as we don't need the entire document at once. Also, we don’t want to access nodes at random.

Now, let us turn our attention to SAX. We will first look at the general steps required to program SAX interfaces.

Programming SAX interfaces through Visual Basic involves following steps:

In our sample project mycontenthandler class implements the IVBSAXContentHandler interface while myerrorhandler class implements the IVBSAXErrorHandler interface.

Note that since we are implementing interfaces in VB you must add empty method definitions even though you are not interested in coding those methods.

We will perform the following tasks on the document:

  1. Load the document from disk
  2. Parse the document
  3. Receive event notifications
  4. Handle events
  5. Read document contents
  6. Trap errors occurred during parsing
  7. Display the document content on a Form

Here is the code to count the price of books in the desired category and display title–authors:

Dim m_Reader As MSXML2.VBSAXXMLReader
Dim m_ContentHandler As mycontenthandler
Dim m_ErrorHandler As myerrorhandler

Set m_ContentHandler = New mycontenthandler
Set m_ErrorHandler = New myerrorhandler

m_ContentHandler.Catagory = "prog"
Set m_Reader = New MSXML2.VBSAXXMLReader
Set m_Reader.contentHandler = m_ContentHandler
Set m_Reader.errorHandler = m_ErrorHandler

m_Reader.parseURL ("catalog.xml")
MsgBox "Total price of " & m_ContentHandler.Catagory & " books is " & _
                                           "$" & m_ContentHandler.Price

Here, we have created an object of XMLReader . Then we assign contenthandler and errorhandler properties to the corresponding objects of our implemented classes. Finally we start parsing the XML file using the parseURL method. Unlike DOM, the SAX parser need not load the entire document at a time and hence with huge documents the processing will be faster and memory efficient.

The main screen looks like this:

As stated earlier we need to implement the IVBSAXContentHAndler interface. The entire class module looks like this:

Option Explicit

Implements IVBSAXContentHandler

Private m_count As Integer
Private m_price As Currency
Private m_catagory As String
Private cur_ele As String
Private isReqdCatagory As Boolean

Private Sub IVBSAXContentHandler_characters(ByVal strChars As String)

If strChars <> vbCrLf + vbTab Then
If isReqdCatagory = True Then
Select Case cur_ele
    Case "title"
        MsgBox "Author :" & strChars
    Case "author"
        MsgBox "Title :" & strChars
End Select
End If

End If

End Sub



Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS _
                                             As MSXML2.IVBSAXLocator)

End Property

Private Sub IVBSAXContentHandler_endDocument()
    
End Sub

Private Sub IVBSAXContentHandler_endElement(ByVal strNamespaceURI _
 As String, ByVal strLocalName As String, ByVal strQName As String)
If strLocalName = "book" Then

End If

End Sub

Private Sub IVBSAXContentHandler_endPrefixMapping(ByVal strPrefix _
                                                         As String)

End Sub

Private Sub IVBSAXContentHandler_ignorableWhitespace(ByVal strChars _
                                                           As String)

End Sub

Private Sub IVBSAXContentHandler_processingInstruction(ByVal strTarget _
                                     As String, ByVal strData As String)

End Sub

Private Sub IVBSAXContentHandler_skippedEntity(ByVal strName As String)

End Sub

Private Sub IVBSAXContentHandler_startDocument()

End Sub

Private Sub IVBSAXContentHandler_startElement(ByVal strNamespaceURI _ 
            As String, ByVal strLocalName As String, ByVal strQName _ 
              As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)
cur_ele = strLocalName

If strLocalName = "book" Then
    If oAttributes.getValueFromName("", "catagory") = Catagory Then
        isReqdCatagory = True
        Price = Price + oAttributes.getValueFromName("", "price")
        Count = Count + 1
    Else
        isReqdCatagory = False
    End If
End If


End Sub

Private Sub IVBSAXContentHandler_startPrefixMapping(ByVal strPrefix _ 
                                    As String, ByVal strURI As String)

End Sub

Public Property Get Count() As Variant
Count = m_count

End Property

Public Property Let Count(ByVal vNewValue As Variant)
m_count = vNewValue

End Property

Public Property Get Price() As Variant
Price = m_price
End Property

Public Property Let Price(ByVal vNewValue As Variant)
m_price = vNewValue
End Property

Public Property Get Catagory() As Variant
Catagory = m_catagory

End Property

Public Property Let Catagory(ByVal vNewValue As Variant)
m_catagory = vNewValue

End Property

Note how empty methods are added since we are implementing the interface. Also, note that when SAX reads the document it also reads new line characters, tab characters, spaces etc. these will be reported by the characters method. You can filter these characters using Visual Basic string manipulation functions.

Now let us look at implementation of the IVBSAXErrorHandler interface:

Option Explicit

Implements IVBSAXErrorHandler

Private Sub IVBSAXErrorHandler_error(ByVal oLocator _
            As MSXML2.IVBSAXLocator, ByVal strError _
                 As String, ByVal nErrorCode As Long)
MsgBox "Recoverable Error"

End Sub

Private Sub IVBSAXErrorHandler_fatalError(ByVal oLocator _
                 As MSXML2.IVBSAXLocator, ByVal strError _
                      As String, ByVal nErrorCode As Long)
MsgBox "Fatal Error At Col " + CStr(oLocator.columnNumber) + _
                           " Row " + CStr(oLocator.lineNumber)

End Sub

Private Sub IVBSAXErrorHandler_warning(ByVal oLocator _
              As MSXML2.IVBSAXLocator, ByVal strError _
                   As String, ByVal nErrorCode As Long)

End Sub 

Even though we have implemented all the three error methods, the current version of the parser will always call a fatalError method, no matter what the actual error is. Inside the error handler methods you can use the Locator object to find the column number and line number at which the error occurred.

Finally, the code that actually loads document from the disk and parses it looks like this:

Private Sub Command1_Click()

On Error Resume Next
ListView1.ListItems.Clear

Dim m_Reader As MSXML2.VBSAXXMLReader
Dim m_ContentHandler As mycontenthandler
Dim m_ErrorHandler As myerrorhandler

Set m_ContentHandler = New mycontenthandler
Set m_ErrorHandler = New myerrorhandler

Set m_Reader = New MSXML2.VBSAXXMLReader
Set m_Reader.contentHandler = m_ContentHandler
Set m_Reader.errorHandler = m_ErrorHandler

m_Reader.parseURL ("catalog.xml")


End Sub

Here, we have created an object of XMLReader . Then we assign contenthandler and errorhandler properties to the corresponding objects our implemented classes. Finally we start parsing the XML file using the parseURL method.

Our XML document does not contain any errors. You can try removing end tags of some elements and check how fatal error is raised.

Summary

We have covered the basics of SAX and how it differs from DOM in parsing our XML documents. It has its limitations in that we cannot write to an xml document but if we want to parse a large xml document for selective data retrieval it is quicker than using DOM

 
 
   
  RATE THIS ARTICLE
  Please rate this article (1-5). Was this article...
 
 
Useful? No Yes, Very
 
Innovative? No Yes, Very
 
Informative? No Yes, Very
 
Brief Reader Comments?
Your Name:
(Optional)
 
  USEFUL LINKS
  Related Tasks:
 
 
   
  Related ASPToday Articles
   
  • SAX and Microsoft XML Parser 3.0 (April 23, 2001)
  • Multiple Source Documents with XSLT (January 16, 2001)
  • Microsoft’s Next Generation XML Parser version 3.0 (June 13, 2000)
  •  
           
     
     
      Related Sources
     
  • MSXML 3.0 Parser Download: http://msdn.microsoft.com/downloads/webtechnology/xml/msxml.asp
  •  
     
           
      Search the ASPToday Living Book   ASPToday Living Book
     
      Index Full Text Advanced 
     
     
           
      Index Entries in this Article
     
  • advantages
  •  
  • BaseURL property
  •  
  • characters method
  •  
  • COM
  •  
  • ContentHandler property
  •  
  • DOM
  •  
  • endDocument method
  •  
  • endElement method
  •  
  • ErrorHandler property
  •  
  • errors
  •  
  • events
  •  
  • fatal errors
  •  
  • GetIndexFromName method
  •  
  • GetLocalName method
  •  
  • GetValue method
  •  
  • IVBSAXAttributes interface
  •  
  • IVBSAXContentHandler interface
  •  
  • IVBSAXErrorHandler interface
  •  
  • IVBSAXXMLReader interface
  •  
  • length property
  •  
  • limitations
  •  
  • parse method
  •  
  • parseURL method
  •  
  • SAX
  •  
  • SAX programming
  •  
  • Simple API for XML
  •  
  • startDocument method
  •  
  • startElement method
  •  
  • using
  •  
  • Visual Basic
  •  
  • XML documents
  •  
     
     
    HOME | SITE MAP | INDEX | SEARCH | REFERENCE | FEEDBACK | ADVERTISE | SUBSCRIBE
    .NET Framework Components Data Access DNA 2000 E-commerce Performance
    Security Admin Site Design Scripting XML/Data Transfer Other Technologies

     
    ASPToday is brought to you by Wrox Press (http://www.wrox.com/). Please see our terms and conditions and privacy policy.
    ASPToday is optimised for Microsoft Internet Explorer 5 browsers.
    Please report any website problems to webmaster@asptoday.com. Copyright © 2001 Wrox Press. All Rights Reserved.