Wrox Press
ASPToday
       1056 Articles
  in the Solutions Library
  Log off
 
 
 
CountryHawk!  
 
ASPToday Subscriber's Article Alex Homer
Here We Go Again: Part 4 - browser compatibility issues and data formats
by Alex Homer
Categories: XML/Data Transfer, Site Design, Scripting
Article Rating: 3 (3 raters)
Published on March 6, 2003
 
Content Related Links Discussion Comments Index Entries Downloads
 
Abstract
In this 5-part series, published over the next five weeks, Alex Homer takes a broad look at some of the ways that the Web has turned the development world around full circle, and (more importantly) some of the ways that we can be ready for the next revolution.

In this penultimate article, Alex looks at the problem of browser incompatibility and at how XML is likely to become the de-facto persistence format for data in the medium term.
 

Article Information
Author Alex Homer
Chief Technical Editor John R. Chapman
Indexer Adrian Axinte
Reviewed by Andrew Krowczyk

 
Article

Browser Compatibility

The original HTML specification was designed to support the "scientific information" approach, with a range of elements such as <p> for a paragraph, <ul> and <ol> for lists, <s> for strong, <xmp> for example code and <dl>, <dt> and <dd> for lists of definitions and their descriptions. We've come a long way from this, and along the way we've "lost the plot" in some respects.

For example, the "heading" elements <h1> through to <h6> not only provide suitable visual formatting of headings and subheadings in the page, but they should also denote the actual structure of the page content – rather like the "outline view" in Microsoft Word and other word processors. But how many pages today follow this approach? Very few – which makes it impossible for any kind of automated page parsing system to pull out headings from the page.

Of course, HTML was bound to evolve over time to meet the requirements of users and developers. What was unfortunate was that the predominant browser manufacturers added elements and parsing features that were more based on their perception of user requirements than on actual standards. And, of course, they all did it in a different way. OK, so the World-Wide Web Consortium (W3C) didn't exactly lead the way with prompt new specs, so you probably can’t blame them.

Browser-Specific HTML

However, the result is that we've ended up with a mish-mash of confusing element types and attribute sets, which don't even produce the same effects on all the different browsers. A perfect example is the <layer> element, which was introduced in Netscape Navigator 4.0, and then dropped again in Netscape 6.0. The result is that we all ended up trying to write generic pages that avoid all these non-universal features, or multiple browser-specific pages that do use these features, or even pages that use client-side script to detect the capabilities of the browser and change their behavior as required.

There are some well-known ways to detect the features of different types of browser, and the browser sends an identifying user agent string to the server with each request, so it is possible to modify images dynamically to suit that browser. The behavior of the ASP.NET server controls that we discussed earlier demonstrates how a page can change to suit specific types of client if required.

Server-side Browser Detection in ASP 3.0

ASP 3.0 and earlier ship with a component called BrowserType, which can be used to detect features of the current browser or user agent. It simply needs to be instantiated using the Server.CreateObject method, and then a range of properties can be accessed:

<%  
' instantiate BrowserType object
Set bc = Server.CreateObject("MSWC.BrowserType") 

' get the browser type and version
sUAType = bc.browser
sUAVer = bc.version  
%>

See the IIS help file (http://localhost/iishelp/?WROXEMPTOKEN=2201316ZPmHzfBRI92b3wWKP0l) for more details.

Server-side Browser Detection in ASP.NET

ASP.NET makes it even easier to detect the browser type, and specific features of the browser. The Request object exposes a Browser property that can be queried to get a similar range of properties for the current client as are available through the ASP 3.0 BrowserType technique. For example, we can get the browser type and major version using:

' get the browser type and version
Dim sUAType As String = Request.Browser("Browser")
Dim sUAVer As String = Request.Browser("MajorVersion")

See the .NET SDK (search for Request.Browser) for more details or the online version of the SDK.

Client-side Browser Feature Detection

The traditional way to identify a browser in client-side code is to query the properties of the navigator object that is exposed by the root window object. For example:

var sBrowserName = window.navigator.appName;
var sVerison = window.navigator.appVersion;

However, this requires some complex parsing and decision constructs to figure out which kind of features our client supports. An easier and generally more accepted approach is to check if the client implements various object model collections and/or methods.

Only Internet Explorer version 4.x and above implements the document.all collection for accessing the contents of a rendered page. And only Navigator 4.x implements the document.layers collection for accessing the <layer> elements contained in a page. Meanwhile, any browser that supports the DOM Level 2 and CSS2 will support the document.getElementById method

So, we can use detection code like the following – note that, as Internet Explorer 5.x implements the getElementById method, we have to check for this when looking for CSS2 and DOM2 compliant browsers:

if (document.all) {
  //
  // code for Internet Explorer 4.x and above goes here
  //
}
if (document.layers) {
  //
  // code for Netscape Navigator 4.x only goes here
  //
}
if (document.getElementById && !document.all) {
  //
  // code for DOM2 and CSS2 compliant clients goes here
  //
} 

Client-side Controls and Applets

Another area where incompatibility has grown to become a real problem is in the use of client-side controls and applets. These can either be downloaded from the server and automatically installed on the client, or the user can install them manually. Installing the browser itself often also installs a set of controls or objects automatically. For example, Internet Explorer installs ActiveX controls that can be used to display structured graphics and work with multimedia resources.

Some of the objects that are installed are universal, for example the control required to display Flash animations. But taking advantage of these kinds of features means assuming that the user is viewing the page in a graphical browser. They might not be. They could be using a specialist user agent of some kind, a text-based browser, a PocketPC, a cellular mobile phone, or even a Web-enabled microwave oven.

Going Back to a Universal Markup Standard

Despite some criticism earlier in this section of the article, W3C have done a remarkable job in the last couple of years in producing up-to-date standards for HTML. They have also looked at the issues regarding the increased take-up of XML (Extensible Markup Language), and the processing of XML document in parsers and though XSLT (the XML Stylesheet Language). As a result of this, we now have a new standard called Extensible Hypertext Markup Language, or XHTML.

XHTML is a reformulation of the latest version of HTML (4.01) into an XML-compliant syntax, so that pages can be understood by current browsers but also be processed by XML-compliant parsers. You'll see in our example code that we use XHTML-compliant syntax, with a closing slash included in elements that traditionally did not have one (such as <br />, <hr /> and <input ... />).

XHTML allows us to provide documents in a format that actually does have a future, unlike traditional HTML which – though it will never fully disappear – has limitations that will prevent it meeting the needs of automated document processing, sensible indexing, and future techniques for information retrieval and analysis.

The Future?

There's no doubt that XHTML is the future, and that other standards and proposals linked to it will become more prominent as time goes by. One of these is the proposal for the division of all the available XHTML elements into separate modules. For example, all the elements concerned with creating (X)HTML-style tables will be in the "tables" module. Some modules will be mandatory for all clients, such as text, list and hyperlink. However, a client will be able to easily describe its capabilities by simply indicating which of the other six or so modules it does support.

Another standard that has been around for a while, but has not seen wide acceptance so far, is the Resource Description Framework (RDF). This provides techniques for using XML-compliant documents to describe a resource such as a Web site or individual Web pages. It is far more expressive that the current <meta> element approach we use now, and can offer extra benefits such as inheritance from other documents. The problem is that, being an XML-based system, is really needs the web pages it will describe to be XML-compliant as well to allow some automated indexing to take place. If only we'd had XHTML a few years ago!

The other aspect of browser compatibility, at least in the last year or so, is that it all seems to finally be coming together. As we've noted earlier in this article, the current releases of graphical browsers from Netscape, Mozilla, Microsoft and Opera are all broadly compliant with HTML 4.01, XHTML, CSS2 and the XML/ HTML DOM version 2.0. The signs are good for the future, but the thorn in the side today is how to continue support for the old versions in your pages.

What Should I Do Now?

The first thing is easy. Write XHTML-compliant web pages. ASP.NET makes it easier because all the server controls generate XHTML-compliant output. It's not hard to get into the habit of using XHTML either, and there are several tools available that you can use to check your pages. A good example is the W3C validator page at http://validator.w3.org/?WROXEMPTOKEN=2201316ZPmHzfBRI92b3wWKP0l, or the validation tools available from Mozquito Technologies at http://www.mozquito.com/?WROXEMPTOKEN=2201316ZPmHzfBRI92b3wWKP0l.

Another point is to look at how you use block formatting and heading in your pages. Why not use the heading elements <h1> to <h6> to provide a guide to the structure of a page? You can use CSS or embedded <font> elements to control the style of the text within them, and it sure provides a better indication of structure and content than using <div> or <span> for all of them.

Provide Structure and Content Hints for Automated Parsing

Also consider using the <p> element as a block-enclosing element, as it was designed for, rather than just to create a "blank line" at specific points. This way, you can describe each paragraph using a title attribute, as well as providing a kind of hierarchy to the content of the page:

<h1>Main Title</h1>
<p title="precis">This is the precis of the document</p>
<h2>Overview</h2>
<p title="overview">This is an overview of the content</p>
<h3>Overview of Section 1</h3>
<p title="overview1">This is an overview of section one </p>
... etc...

Keep Up To Date with New Standards

It's vital to keep up to date with new standards whatever part of the industry you work in. However, as far as XML-based technologies go, the constant and regular changes do make this hard. Take you eye off XML Schemas for a few months and it's all changed completely when you come back to it again.

Thankfully, most of the important standards have matured and are changing less violently now. Examples are XML itself, XHTML and XSLT. XML Schemas is settling down nicely too, though you may find this less vital if you predominantly work only with Web pages rather that generic XML documents.

Certainly, if you are not familiar with it, learn about XSLT. It provides great opportunities for adding style to documents, analyzing documents and even modifying them. However, there is an even newer standard that is emerging – namely XML Query Language, or just XQuery. At the moment (early 2003), the churn rate is high on this proposed standard. However, it will be a strong contender in the future as it provides techniques for accessing the content of XHTML and XML documents using a SQL-style syntax. We'll look at XQuery in a little more depth later in this article.

Data Formats

Most of the earliest mainframe database programs used simple formats for storing data in a disk file (or even on tape). Each "row" or "record" in a data table is treated a fixed-length string, and a separate set of information files hold details of where each "column" or "field" starts and finishes within that text string. The database program can then extract the relevant parts of the string and present them as required, on demand.

However, this technique proved to be inefficient in many ways, not least in the way that text values had to be padded to fill the specified column width, and the fact that numbers often consumed much more space than is required by treating them as binary values. An alternative approach soon became the norm, storing binary values that represent both text and numbers.

As a simplified example, the text columns can have some specific "end of field" delimiter, and only use the amount of space actually required. And numbers can be stored in binary form, with the number of bytes used defined by the data type for the column:

Going Relational

Of course, one of the main features of a database is to allow the data to be stored in separate tables that are linked together. This provides the most efficient storage approach, with the minimal amount of wasted space, whilst still preserving all the relationships between individual items of data. This is the well-known technique of normalization, which was formalized by E. F. Codd in his research papers on Relational Algebra back in 1970 (he probably wore flared trousers at the time as well).

Early approaches for supporting this relational model were through separate files that contain indexes for the tables and the links between the rows. In more recent relational databases, system tables that contain the attributes or rules to be applied to the data tables, columns and their content – replacing these separate files.

Going Back to Text

We've thrown a lot of these advances away in the last couple of years by going back to a text-based data persistence format all over again. As you'll no doubt be aware, XML is to blame. Yet XML provides so many other compelling advantages, that we really don't have any choice in the long term. The whole industry is embracing and adopting XML as the new data persistence and transmission format, and without doubt it is the way forward.

<?xml version="1.0"?>
<employees>
  <employee>
    <last-name>Smith</last-name>
    <emp-no>106</emp-no>
  </employee>
</employees>

XML finally gives us the opportunity to achieve platform, application and operating system independence, as well as being ideal for transmission over the Internet via existing protocols. It's also easy to store as a disk file, transmit as a stream, and manipulate using a standard parser application.

Of course, there are issues here. Not only is the format of XML more verbose than binary-format data sets, but it also requires an all-new series of standards and techniques to support it and allow us to make use of it. These standards and techniques are (in most cases) in place today, however, and more continue to be developed to extend the usefulness of XML generally.

For example, the version 1.0 standard for XML itself is fixed, as is the standard for the XSLT stylesheet language. The version 1.0 standard for XML Schemas is also in place, allowing tight definitions of the structure, content and data types of an XML document. Work is progressing (at the time of writing) on other standards such as XML Query Language (XQuery).

The Future?

It does seem extremely likely, from public announcements made by W3C and most large software manufacturers, that XML will become the de-facto persistence format for data in the medium term. Already, most new applications can export data as XML, and many can read XML and use it as a data source.

In ASP 3.0, XML support is almost entirely implemented through the MSXML parser that is installed with IE5, or available separately from Microsoft (see http://msdn.microsoft.com/library/en-us/xmlsdk30/htm/xmmscxmloverview.asp?WROXEMPTOKEN=2201316ZPmHzfBRI92b3wWKP0l). In ASP.NET, XML support is built into the class library in the System.Xml namespace and its related namespaces.

More than that, the ADO.NET data access objects (from System.Data and it's related namespaces) also have XML support built right in. For example, the only format available for persisting the content of a DataSet to a stream or disk file is as XML. ADO.NET and System.Xml also provide a bridge between the relational and XML worlds via the XmlDataDocument object, which can expose its data content as either an XML document or a DataSet object.

Another area where XML is firmly established, and gaining ground, is in Web Services. These use a series of XML-based standards such as Simple Object Access Protocol (SOAP) and the Web Service Description Language (WSDL) to describe and implement Web Services, and the data they expose is also in XML format.

So, you could say that the future is already here now, and that all we have to do is grasp it. But there are even more exciting things appearing in the near and middle distance. An XML standard for generating graphics (Scalable Vector Graphics or SVG) is in place, as is the Synchronized Multimedia Integration Language (SMIL) for interactive audio-visual presentations that integrate streaming audio and video with images, text or any other media type.

XML Query Language

Probably the most exciting development as far as "data" is concerned is the work on XML Query Language (XQuery). While it's possible to extract and manipulate XML document content using XSLT now, there are several issues that make it a less than ideal solution in the widest context.

For example, XSLT is verbose, requiring a stylesheet containing templates to be created for even the simplest operation. It also depends on recursion, which frightens off many less experienced developers. And, more to the point, it bears absolutely no relationship at all to existing data query techniques such as SQL, which often acts as an effective barrier to entry for the database developer more familiar with existing and well-proved SQL-based techniques.

In XQuery, we can finally implement a query that follows (at least for most developers) a more natural style. In fact, it's almost a case of SELECT [elements] FROM [xml-document] WHERE [element-value] = "This Value". Or, to be more precise and use the current proposed syntax:

{
FOR $elem IN document("myfile.xml")//elem-name
  WHERE string($elem/@this-attr) = "This Value"
  RETURN string($elem/this-child-elem)
}

In XML terms, this simple XQuery iterates through a document named myfile.xml extracting all the elements named elem-name that have an attribute named this-attr with the value "This Value", and returning the value of a child element named this-child-elem from each one it finds. In SQL terms, it selects all the "rows" from the "column" named this-child-elem in a "table" named myfile.xml, where the "column" named this-attr has a value that is LIKE "This Value".

Native XML Support in Database Servers

Microsoft SQL Server 2000 and Oracle 9i both implement a "SQL/XML" technology, allowing SQL statements to return XML directly, and even the use of XML diffgrams to update data within the database. Microsoft has also publicly stated that future versions of SQL Server will implement "deep investment in XQuery and native XML support", as well as allowing managed code stored procedures to be created (so you will be able to execute XML-based queries within these stored procedures). Meanwhile, Oracle is already supporting XQuery through add-ons to its database (see http://otn.oracle.com/tech/xml/xmldb/content.html?WROXEMPTOKEN=2201316ZPmHzfBRI92b3wWKP0land http://otn.oracle.com/tech/xml/xmldb/htdocs/querying_xml.html?WROXEMPTOKEN=2201316ZPmHzfBRI92b3wWKP0l).

Whether new databases will actually persist data as XML internally is not really an issue, however. It seems unlikely, due to the issues of verbosity and inefficiency we alluded to earlier on. But, of course, this isn't the important issue. We're interested in the interface with the data rather than what goes on inside, as long as externally our data stores can handle the newest techniques for accessing data.

Object-Oriented Data

What is exciting is how this support could become not only widespread, but universal. In other words, allowing XQuery-compliant code to access any database, and return data that is stored in any format. We're already seeing developments in "object-oriented data" techniques, where the data is an object that exposes properties rather than a two-dimensional data table.

Standards like SOAP and Web Services are designed to offer remote object access, and XML can provide persistence and transmission of objects using existing standards. So, maybe in time, we'll see development opportunities where the only consideration when using one database or another, irrespective of where it's located, will be in specifying the correct URL.

What Should I Do Now?

Again, the vital aspect is to keep up to date with new standards as they develop and are released. Obviously XQuery is going to be the major focus as far as data stores and data access are concerned. While the standard is relatively volatile, practical work is limited to experimentation, but none of this effort will be wasted.

In general, today, the one area where you can be ready for the future is in the way you decide to persist, transmit and manipulate data in your applications. For example:

  • Persist any data you output to streams and disk files as XML
  • Provide an option to read any data you require as XML
  • Build in interoperability with Web Services where possible
  • Allow any data definitions you require to be made using an XML Schema
  • Make sure any rendered output is XHTML compliant

Next...

In Part 5, we finish up with a look at how application architecture has evolved, and processing location has changed, as we have moved onto the Web.

 

Please rate this article using the form below. By telling us what you like and dislike about it we can tailor our content to meet your needs.

 
 
Rate this Article
How useful was this article?
Not useful Very useful
Brief Reader Comments: Read Comments
Your name (optional):
 
 
Content Related Links Discussion Comments Index Entries Downloads
 
Back to top