Browser Compatibility
The original HTML specification was designed to support the
"scientific information" approach, with a range of elements such as
<p> for a paragraph, <ul> and <ol> for lists, <s> for strong, <xmp> for example code and <dl>, <dt> and <dd> for lists of definitions and
their descriptions. We've come a long way from this, and along the
way we've "lost the plot" in some respects.
For example, the "heading" elements <h1> through to <h6> not only provide suitable visual
formatting of headings and subheadings in the page, but they should
also denote the actual structure of the page content rather like
the "outline view" in Microsoft Word and other word processors. But
how many pages today follow this approach? Very few which makes it
impossible for any kind of automated page parsing system to pull out
headings from the page.
Of course, HTML was bound to evolve over time to meet the
requirements of users and developers. What was unfortunate was that
the predominant browser manufacturers added elements and parsing
features that were more based on their perception of user
requirements than on actual standards. And, of course, they all did
it in a different way. OK, so the World-Wide Web Consortium (W3C) didn't exactly lead
the way with prompt new specs, so you probably cant blame them.
Browser-Specific HTML
However, the result is that we've ended up with a mish-mash of
confusing element types and attribute sets, which don't even produce
the same effects on all the different browsers. A perfect example is
the <layer> element, which was
introduced in Netscape Navigator 4.0, and then dropped again in
Netscape 6.0. The result is that we all ended up trying to write
generic pages that avoid all these non-universal features, or
multiple browser-specific pages that do use these features, or even
pages that use client-side script to detect the capabilities of the
browser and change their behavior as required.
There are some well-known ways to detect the features of
different types of browser, and the browser sends an identifying
user agent string to the server with each request, so it is possible
to modify images dynamically to suit that browser. The behavior of
the ASP.NET server controls that we discussed earlier demonstrates
how a page can change to suit specific types of client if
required.
Server-side Browser Detection in ASP 3.0
ASP 3.0 and earlier ship with a component called BrowserType, which can be used to detect
features of the current browser or user agent. It simply needs to be
instantiated using the Server.CreateObject method, and then a range
of properties can be accessed: <%
' instantiate BrowserType object
Set bc = Server.CreateObject("MSWC.BrowserType")
' get the browser type and version
sUAType = bc.browser
sUAVer = bc.version
%>
See the IIS help file (http://localhost/iishelp/?WROXEMPTOKEN=2201316ZPmHzfBRI92b3wWKP0l)
for more details.
Server-side Browser Detection in ASP.NET
ASP.NET makes it even easier to detect the browser type, and
specific features of the browser. The Request object exposes a Browser property that can be queried to get
a similar range of properties for the current client as are
available through the ASP 3.0 BrowserType technique. For example, we can
get the browser type and major version using: ' get the browser type and version
Dim sUAType As String = Request.Browser("Browser")
Dim sUAVer As String = Request.Browser("MajorVersion")
See the .NET SDK (search for Request.Browser) for more details or the online version of the SDK.
Client-side Browser Feature Detection
The traditional way to identify a browser in client-side code is
to query the properties of the navigator object that is exposed by the root
window object. For example: var sBrowserName = window.navigator.appName;
var sVerison = window.navigator.appVersion;
However, this requires some complex parsing and decision
constructs to figure out which kind of features our client supports.
An easier and generally more accepted approach is to check if the
client implements various object model collections and/or
methods.
Only Internet Explorer version 4.x and above implements the document.all collection for accessing the
contents of a rendered page. And only Navigator 4.x implements the
document.layers collection for
accessing the <layer> elements
contained in a page. Meanwhile, any browser that supports the DOM
Level 2 and CSS2 will support the document.getElementById method
So, we can use detection code like the following note that, as
Internet Explorer 5.x implements the getElementById method, we have to check for
this when looking for CSS2 and DOM2 compliant browsers: if (document.all) {
//
// code for Internet Explorer 4.x and above goes here
//
}
if (document.layers) {
//
// code for Netscape Navigator 4.x only goes here
//
}
if (document.getElementById && !document.all) {
//
// code for DOM2 and CSS2 compliant clients goes here
//
}
Client-side Controls and Applets
Another area where incompatibility has grown to become a real
problem is in the use of client-side controls and applets. These can
either be downloaded from the server and automatically installed on
the client, or the user can install them manually. Installing the
browser itself often also installs a set of controls or objects
automatically. For example, Internet Explorer installs ActiveX
controls that can be used to display structured graphics and work
with multimedia resources.
Some of the objects that are installed are universal, for example
the control required to display Flash animations. But taking
advantage of these kinds of features means assuming that the user is
viewing the page in a graphical browser. They might not be. They
could be using a specialist user agent of some kind, a text-based
browser, a PocketPC, a cellular mobile phone, or even a Web-enabled
microwave oven.
Going Back to a Universal Markup Standard
Despite some criticism earlier in this section of the article,
W3C have done a remarkable job in the last couple of years in
producing up-to-date standards for HTML. They have also looked at
the issues regarding the increased take-up of XML (Extensible Markup
Language), and the processing of XML document in parsers and though
XSLT (the XML Stylesheet Language). As a result of this, we now have
a new standard called Extensible Hypertext Markup Language, or XHTML.
XHTML is a reformulation of the latest version of HTML (4.01)
into an XML-compliant syntax, so that pages can be understood by
current browsers but also be processed by XML-compliant parsers.
You'll see in our example code that we use XHTML-compliant syntax,
with a closing slash included in elements that traditionally did not
have one (such as <br />, <hr /> and <input ... />).
XHTML allows us to provide documents in a format that actually
does have a future, unlike traditional HTML which though it will
never fully disappear has limitations that will prevent it meeting
the needs of automated document processing, sensible indexing, and
future techniques for information retrieval and analysis.
The Future?
There's no doubt that XHTML is the future, and that other
standards and proposals linked to it will become more prominent as
time goes by. One of these is the proposal for the division of all
the available XHTML elements into separate modules. For example, all
the elements concerned with creating (X)HTML-style tables will be in
the "tables" module. Some modules will be mandatory for all clients,
such as text, list and hyperlink. However, a client will be able to
easily describe its capabilities by simply indicating which of the
other six or so modules it does support.
Another standard that has been around for a while, but has not
seen wide acceptance so far, is the Resource Description Framework (RDF). This provides
techniques for using XML-compliant documents to describe a resource
such as a Web site or individual Web pages. It is far more
expressive that the current <meta> element approach we use now,
and can offer extra benefits such as inheritance from other
documents. The problem is that, being an XML-based system, is really
needs the web pages it will describe to be XML-compliant as well to
allow some automated indexing to take place. If only we'd had XHTML
a few years ago!
The other aspect of browser compatibility, at least in the last
year or so, is that it all seems to finally be coming together. As
we've noted earlier in this article, the current releases of
graphical browsers from Netscape, Mozilla, Microsoft and Opera are
all broadly compliant with HTML 4.01, XHTML, CSS2 and the XML/ HTML
DOM version 2.0. The signs are good for the future, but the thorn in
the side today is how to continue support for the old versions in
your pages.
What Should I Do Now?
The first thing is easy. Write XHTML-compliant web pages. ASP.NET
makes it easier because all the server controls generate
XHTML-compliant output. It's not hard to get into the habit of using
XHTML either, and there are several tools available that you can use
to check your pages. A good example is the W3C validator page at http://validator.w3.org/?WROXEMPTOKEN=2201316ZPmHzfBRI92b3wWKP0l,
or the validation tools available from Mozquito Technologies at http://www.mozquito.com/?WROXEMPTOKEN=2201316ZPmHzfBRI92b3wWKP0l.
Another point is to look at how you use block formatting and
heading in your pages. Why not use the heading elements <h1> to <h6> to provide a guide to the
structure of a page? You can use CSS or embedded <font> elements to control the style
of the text within them, and it sure provides a better indication of
structure and content than using <div> or <span> for all of them.
Provide Structure and Content Hints for
Automated Parsing
Also consider using the <p>
element as a block-enclosing element, as it was designed for, rather
than just to create a "blank line" at specific points. This way, you
can describe each paragraph using a title attribute, as well as providing a kind
of hierarchy to the content of the page: <h1>Main Title</h1>
<p title="precis">This is the precis of the document</p>
<h2>Overview</h2>
<p title="overview">This is an overview of the content</p>
<h3>Overview of Section 1</h3>
<p title="overview1">This is an overview of section one </p>
... etc...
Keep Up To Date with New Standards
It's vital to keep up to date with new standards whatever part of
the industry you work in. However, as far as XML-based technologies
go, the constant and regular changes do make this hard. Take you eye
off XML Schemas for a few months and it's all changed completely
when you come back to it again.
Thankfully, most of the important standards have matured and are
changing less violently now. Examples are XML itself, XHTML and
XSLT. XML Schemas is settling down nicely too, though you may find
this less vital if you predominantly work only with Web pages rather
that generic XML documents.
Certainly, if you are not familiar with it, learn about XSLT. It
provides great opportunities for adding style to documents,
analyzing documents and even modifying them. However, there is an
even newer standard that is emerging namely XML Query Language, or just XQuery. At the moment
(early 2003), the churn rate is high on this proposed standard.
However, it will be a strong contender in the future as it provides
techniques for accessing the content of XHTML and XML documents
using a SQL-style syntax. We'll look at XQuery in a little more
depth later in this article.
Data Formats
Most of the earliest mainframe database programs used simple
formats for storing data in a disk file (or even on tape). Each
"row" or "record" in a data table is treated a fixed-length string,
and a separate set of information files hold details of where each
"column" or "field" starts and finishes within that text string. The
database program can then extract the relevant parts of the string
and present them as required, on demand.
However, this technique proved to be inefficient in many ways,
not least in the way that text values had to be padded to fill the
specified column width, and the fact that numbers often consumed
much more space than is required by treating them as binary values.
An alternative approach soon became the norm, storing binary values
that represent both text and numbers.
As a simplified example, the text columns can have some specific
"end of field" delimiter, and only use the amount of space actually
required. And numbers can be stored in binary form, with the number
of bytes used defined by the data type for the column:
Going Relational
Of course, one of the main features of a database is to allow the
data to be stored in separate tables that are linked together. This
provides the most efficient storage approach, with the minimal
amount of wasted space, whilst still preserving all the
relationships between individual items of data. This is the
well-known technique of normalization, which was formalized by E. F.
Codd in his research papers on Relational Algebra back in 1970 (he probably wore
flared trousers at the time as well).
Early approaches for supporting this relational model were
through separate files that contain indexes for the tables and the
links between the rows. In more recent relational databases, system
tables that contain the attributes or rules to be applied to the
data tables, columns and their content replacing these separate
files.
Going Back to Text
We've thrown a lot of these advances away in the last couple of
years by going back to a text-based data persistence format all over
again. As you'll no doubt be aware, XML is to blame. Yet XML
provides so many other compelling advantages, that we really don't
have any choice in the long term. The whole industry is embracing
and adopting XML as the new data persistence and transmission
format, and without doubt it is the way forward. <?xml version="1.0"?>
<employees>
<employee>
<last-name>Smith</last-name>
<emp-no>106</emp-no>
</employee>
</employees>
XML finally gives us the opportunity to achieve platform,
application and operating system independence, as well as being
ideal for transmission over the Internet via existing protocols.
It's also easy to store as a disk file, transmit as a stream, and
manipulate using a standard parser application.
Of course, there are issues here. Not only is the format of XML
more verbose than binary-format data sets, but it also requires an
all-new series of standards and techniques to support it and allow
us to make use of it. These standards and techniques are (in most
cases) in place today, however, and more continue to be developed to
extend the usefulness of XML generally.
For example, the version 1.0 standard for XML itself is fixed, as
is the standard for the XSLT stylesheet language. The version 1.0
standard for XML Schemas is also in place, allowing tight
definitions of the structure, content and data types of an XML
document. Work is progressing (at the time of writing) on other
standards such as XML Query Language (XQuery).
The Future?
It does seem extremely likely, from public announcements made by
W3C and most large software manufacturers, that XML will become the
de-facto persistence format for data in the medium term. Already,
most new applications can export data as XML, and many can read XML
and use it as a data source.
In ASP 3.0, XML support is almost entirely implemented through
the MSXML parser that is installed with IE5, or available separately
from Microsoft (see http://msdn.microsoft.com/library/en-us/xmlsdk30/htm/xmmscxmloverview.asp?WROXEMPTOKEN=2201316ZPmHzfBRI92b3wWKP0l).
In ASP.NET, XML support is built into the class library in the System.Xml namespace and its related
namespaces.
More than that, the ADO.NET data access objects (from System.Data and it's related namespaces)
also have XML support built right in. For example, the only format
available for persisting the content of a DataSet to a stream or disk file is as XML.
ADO.NET and System.Xml also provide a
bridge between the relational and XML worlds via the XmlDataDocument object, which can expose its
data content as either an XML document or a DataSet object.
Another area where XML is firmly established, and gaining ground,
is in Web Services. These use a series of XML-based
standards such as Simple Object Access Protocol (SOAP) and the Web
Service Description Language (WSDL) to describe and implement Web
Services, and the data they expose is also in XML format.
So, you could say that the future is already here now, and that
all we have to do is grasp it. But there are even more exciting
things appearing in the near and middle distance. An XML standard
for generating graphics (Scalable Vector Graphics or SVG) is in place, as is
the Synchronized Multimedia Integration Language (SMIL)
for interactive audio-visual presentations that integrate streaming
audio and video with images, text or any other media type.
XML Query Language
Probably the most exciting development as far as "data" is
concerned is the work on XML Query Language (XQuery). While it's possible to
extract and manipulate XML document content using XSLT now, there
are several issues that make it a less than ideal solution in the
widest context.
For example, XSLT is verbose, requiring a stylesheet containing
templates to be created for even the simplest operation. It also
depends on recursion, which frightens off many less experienced
developers. And, more to the point, it bears absolutely no
relationship at all to existing data query techniques such as SQL,
which often acts as an effective barrier to entry for the database
developer more familiar with existing and well-proved SQL-based
techniques.
In XQuery, we can finally implement a query that follows (at
least for most developers) a more natural style. In fact, it's
almost a case of SELECT [elements] FROM
[xml-document] WHERE [element-value] = "This Value". Or, to
be more precise and use the current proposed syntax: {
FOR $elem IN document("myfile.xml")//elem-name
WHERE string($elem/@this-attr) = "This Value"
RETURN string($elem/this-child-elem)
}
In XML terms, this simple XQuery iterates through a document
named myfile.xml extracting all the
elements named elem-name that have an
attribute named this-attr with the
value "This Value", and returning the
value of a child element named this-child-elem from each one it finds. In
SQL terms, it selects all the "rows" from the "column" named this-child-elem in a "table" named myfile.xml, where the "column" named this-attr has a value that is LIKE "This Value".
Native XML Support in Database Servers
Microsoft SQL Server 2000 and Oracle 9i both implement a
"SQL/XML" technology, allowing SQL statements to return XML
directly, and even the use of XML diffgrams to update data within
the database. Microsoft has also publicly stated that future
versions of SQL Server will implement "deep investment in XQuery and
native XML support", as well as allowing managed code stored
procedures to be created (so you will be able to execute XML-based
queries within these stored procedures). Meanwhile, Oracle is
already supporting XQuery through add-ons to its database (see http://otn.oracle.com/tech/xml/xmldb/content.html?WROXEMPTOKEN=2201316ZPmHzfBRI92b3wWKP0land
http://otn.oracle.com/tech/xml/xmldb/htdocs/querying_xml.html?WROXEMPTOKEN=2201316ZPmHzfBRI92b3wWKP0l).
Whether new databases will actually persist data as XML
internally is not really an issue, however. It seems unlikely, due
to the issues of verbosity and inefficiency we alluded to earlier
on. But, of course, this isn't the important issue. We're interested
in the interface with the data rather than what goes on inside, as
long as externally our data stores can handle the newest techniques
for accessing data.
Object-Oriented Data
What is exciting is how this support could become not only
widespread, but universal. In other words, allowing XQuery-compliant
code to access any database, and return data that is stored in any
format. We're already seeing developments in "object-oriented data"
techniques, where the data is an object that exposes properties
rather than a two-dimensional data table.
Standards like SOAP and Web Services are designed to offer remote
object access, and XML can provide persistence and transmission of
objects using existing standards. So, maybe in time, we'll see
development opportunities where the only consideration when using
one database or another, irrespective of where it's located, will be
in specifying the correct URL.
What Should I Do Now?
Again, the vital aspect is to keep up to date with new standards
as they develop and are released. Obviously XQuery is going to be
the major focus as far as data stores and data access are concerned.
While the standard is relatively volatile, practical work is limited
to experimentation, but none of this effort will be wasted.
In general, today, the one area where you can be ready for the
future is in the way you decide to persist, transmit and manipulate
data in your applications. For example:
- Persist any data you output to streams and disk files as XML
- Provide an option to read any data you require as XML
- Build in interoperability with Web Services where possible
- Allow any data definitions you require to be made using an XML
Schema
- Make sure any rendered output is XHTML compliant
Next...
In Part 5, we finish up with a look at how application
architecture has evolved, and processing location has changed, as we
have moved onto the Web. |