Introduction
To enable a realistic assessment of each data access class we
would ideally be able to isolate the effects of .NET data processing
from all external factors. Such external factors can be network
latency, client overload, web server overload, or even factors
within the database e.g. utilization of indexes, parsing of SQL
statements etc. Unfortunately, while we can measure some (but not
all) of these items directly, it is much easier to simply try to
minimize their relative influence on our measurements and ensure
that the effect remains constant for all tested methods. A detailed
description of our methodology is shown under the "Testing considerations" heading.
Before the quantitative testing however, we will try and identify
situations where a choice between access methods can be made by
considering the whole application environment and data usage. The
result will be a flow diagram that can help you take the right
architectural decision.
System Requirements
To run the code supplied you will need:
- The .NET Framework version 1.0
- SQL Server 2000
- Microsoft Application center Test (ACT) - part of Visual
Studio .NET Enterprise
Testbed setup
- Hardware: DELL Dimension 4300, CPU: Intel Pentium 4 1.6GHz,
RAM: 130,064 KB, 20GB disk
- Software: Windows2000 Server SP2 OS
- Environment settings: system optimized for background
processes, IIS5 optimized for >100,000 hits/day
Qualitative comparison
Strictly speaking, when you see references to DataReader in documents, the class implied
is SqlDataReader. It is the
functionality of that class that we will be comparing to the DataSet class. A diagram will make the
subsequent definitions clearer:
As you can see, the bridge between a DataSet and SQL Server is the SqlDataAdapter, which enables the retrieving
and saving of data; a two-way process. The DataSet can contain more than one DataTable, with constraints and relations
between DataTables. It can therefore
be imagined as a mini-database in memory. In contrast, The SqlDataReader provides a forward-only,
read-only stream of data from the database, which does not get
inserted in any table structure. To create a SqlDataReader, you must call the ExecuteReader method of the SqlCommand object, rather than directly
using a constructor.
There are occasions where the choice between using the DataSet or SqlDataReader class should be clear - after
all, they were designed with different specifications in mind. To
determine which one to use when you design your application, start
by considering the functionality that is needed in the
application.
Use the DataSet in order to do the
following with your application:
- Manipulate data from multiple sources (for example, a mixture
of data from more than one database, from an XML file, and from a
spreadsheet).
- Use or modify the data's relationships. If the relationship
itself is not important then you can consider bringing tables
separately with a SqlDataReader,
however their data will not be connected. You could also
conceivably use a stored procedure or Views to bring together
one-to-one/one-to-many relationship data (less trips from web
server to db server), but more data will be transmitted.
- Exchange data between tiers or using an XML Web service.
Unlike the SqlDataReader, the DataSet can be passed to a remote client.
- Reuse the same set of rows to achieve a performance gain by
caching them (such as for sorting, searching, or filtering the
data).
- Perform a large amount of processing per row. Once the SqlDataReader starts acquiring the data,
no other operation can be performed on the SqlConnection other than closing it.
Extended processing on each row returned using a SqlDataReader ties up the connection
serving the SqlDataReader
exclusively. The sharing of connections with the SqlDataAdapter means that a benefit may be
derived by using the DataSet if much
processing per row is to be performed.
- Needs scalability as an absolute must. As noted above, holding
the connection exclusively makes the application more difficult to
scale up because connection contention might occur. However, see
the last point in SqlDataReader
paragraph below.
- Manipulate data using XML operations such as Extensible
Stylesheet Language Transformations (XSLT transformations) or
XPath queries. However, if you just intend to present the data as
XML then you might want to consider using the SqlDataReader with a stored procedure that
executes a "SELECT FOR XML"
statement.
Use the SqlDataReader in your
application if you:
- Do not need to cache the data, and processing is done on the
fly
- Are processing a set of results too large to fit into memory
- Need to quickly access data once, in a forward-only and
read-only manner, for example loading up a list of items to
present in a listbox or combo box. Usually, the database server is
more suited to retrieving data, so this should be done in
combination with stored procedures, if possible.
- Need minimal Web server CPU usage as an absolute must. The
SqlDataAdapter actually uses the
SqlDataReader to fill a DataSet. Therefore, the performance gained
by using the SqlDataReader instead
of the DataSet is all the extra
processor cycles consumed to fill the DataSet - instead the data is fed directly
to your application.
- Holding the connection exclusively implies connection
contention when scaled up. However, this is conditional upon the
complexity of the actual data retrieval, i.e. the faster retrieval
of the SqlDataReader might actually
result in better scalability because the connection is held for
much less time.
In life nothing is ever clear-cut - there certainly are
situations where both data access methods would be applicable. For
example:
- If you are looking for a specific value from a specific row
multiple times (assuming this value cannot be processed via SQL
statements). You can use multiple SqlDataReader calls that quickly close the
connections, or open a DataSet,
create an index in memory, and search for all the wanted values in
disconnected mode.
- If reading the data is part of some larger operation that may
be better performed by opening a DataSet after a SqlDataReader, or just a DataSet right from the beginning.
For these sorts of situations it is useful to have a quantitative
assessment of each individual method's performance. It is
important to emphasize, however, that absolute performance should
only form part of the wider consideration - business rules,
maintenance aspects or simply functionality might well be overriding
factors in your decision-making.
Testing Considerations
Although strictly speaking, data access techniques should be
tested directly, rather than behind a Web server, the latter is
realistic for common application scenarios. In any case, because we
are comparing relative performance of these data access techniques,
the overhead of testing with a Web server will be same for both, and
it makes testing via Microsoft Application Center Test much easier.
Note that we are not trying to minimize data access time per se, so
we are not using optimized code such as exposing a stored procedure
as an XML web service for example. Ideally however, we need to
isolate the DataSet / Reader effects, so need to maximize their
relative % effect, i.e. minimize random network latency, effects
deriving from SQL statement parsing, unrelated ASP.NET operations
etc.. In practical terms, this implies the following:
Minimizing database server load:
- Use stored procedures to ensure that the SQL statement is
precompiled
- Select only a small number of columns in our SELECT statements to avoid delays caused
by any migrated rows
- Make sure that at least one of these columns is indexed to
enable faster retrieval
Minimizing web server load:
- Optimize Windows 2000 for background processes (under Control Panel | System | Advanced |
Performance)
- Optimize IIS for >100,000 hits per day (under Internet Services Manager | Properties |
Performance).
- Run each ASPX page once using IE before each test is run to
compile it. This simply reduces the "warm-up" period so that less
of it may be needed.
Minimizing time for establishing database connections:
- Modify the initial connection pool size (default = 100) to the
number of simultaneous connections needed, if it is a higher
number. Minimum pool size is not required, since ACT can allow for
warm-up time.
Minimizing data transmission time:
- Use SQL Server, which enables direct access via the native
SQLClient Managed Provider.
- Use same machine as the client, web server and database
server. This gives >2Mb/s bandwidth, minimizing inter-tier
communication-related delays. However it is important to keep the
CPU load low (~90%).
Minimize unnecessary ASP.NET work:
- Ensure you are not running in Debug mode or writing to any unneeded logs
- Simulate data access within the memory structures by attaching
the row values to a DataList
control, but keeping this processing to a minimum. Compare this
performance to iterating through the data values without assigning
them to anything - an unrealistic scenario but one that can show
what part of the total time is spent on filling up the DataSet / Reader with data and then accessing them.
There are a number of additional tips (not used in our tests)
that may be applicable if you are carrying tests on your
application, or simply want to improve performance:
- Use sequential access of the SqlDataReader. By default, the SqlDataReader loads an entire row into
memory with each Read. This allows
for random access of columns within the current row. If this
random access is not necessary, for increased performance, pass
CommandBehavior.SequentialAccess to
the call to ExecuteReader. This
changes the default behavior of the SqlDataReader to only load data into
memory when it is requested. Note that once you have read past a
returned column, you can no longer read its value.
- Call Cancel on a SqlDataReader operation that still has
pending rows to be fetched. Otherwise, calling close on the
connection will first finish off any pending tasks and then close
the connection.
Our testing setup
Our tests will examine the variation of Response time and
Requests Per Second as a function of rows requested, columns
accessed, amount of processing done within the requesting ASP.NET
page, and whether a stored procedure is used or not. The diagram
below shows the overall test structure:
Let's look at the steps that will reproduce the testing setup on
your system:
- Copy the ASP.NET pages: Place the ASP.NET pages in a directory
of your choice under the IIS root (preferably a subdirectory
called "perf" to avoid changing the ACT scripts).
- Open the ACT test file: In the article download you will find
the ADONETPerformance.zip file. Unzip it anywhere in your
system and open the ADONETPerformance.act file from within
ACT. By clicking on each test you will see the VBScript code in
the right bottom frame - here is the code for the test
"DataReaderDirect":
Dim URL
Dim UB, LB
' Set the upperbound for rows
UB = 50000
' Set the lowerbound for rows
LB = 1
' Set the URL
URL = "http://localhost/perf/testDataReaderDir.aspx"
' Use the Randomize function to initialize the Rnd function
Randomize
Test.SendRequest(URL & "?emprows=1500&empid=" & int((UB - LB + 1)*Rnd + LB))
The test defines an upper and lower limit for generating a random
number. This number is sent as the "empid" URL parameter to the ASPX page, and
in turn to the stored procedure if used. The ASPX page takes the
"empid" value and requests rows that
have an EmployeeID value higher than
the one given. This introduces an element of randomness in which
rows are actually returned.
Before running the tests, you will need to modify the
initialization parameter "URL" to
point to the location of the ASP.NET pages copied in the previous
step.
- Create the test data table from which the rows of data will be
requested in our tests: Copy the SQL script files
"createdbtable.sql", "createspemp.sql",
"createspempstar.sql" to a directory of your choice. Open
the "SQL Profiler" program from the Microsoft SQL Server
program group, and open the SQL script "createdbtable.sql"
as shown below:
The SQL Profiler will allow you to run the script step by step -
you can execute it in its entirety or modify it if you wish. It will
create the structure of a table named "EmployeesACT"
identical to the existing "Employees" table under the
"Northwind" database:
- Create the stored procedures: Following the same process as in
the previous step, you can create the "[ dbo].[emp]" and
"[dbo].[empstar]" stored procedures using the other two SQL
scripts, "createspemp.sql", "createspempstar.sql".
You will need to edit them so that their Execute privilege is assigned to the ASP.NET client on your server. By opening
any of the scripts you can see that the last statement is:
GRANT EXECUTE ON [dbo].[emp] TO [your_server_name_here\ASPNET]
GO
- Modify both SQL script files to have the "your_server_name_here" become your
server's identifying name. You can then execute the script files
within SQL Profiler as noted above.
- Fill the test table with data: The download includes a Data
Transformation Package , "ACTimportdb.dts", that you can
run to fill the test table with sample data. It copies the
existing data from the existing Northwind Employees table multiple
times - if you prefer using your own scripts for doing this you
can skip this step. Otherwise, in the SQL Server Enterprise
Manager console tree, right- click Data
Transformation Services, and then click Open Package. In the Select File
dialog box, browse to the ACTimportdb.dts file, and then
click Open. Right-click on the
package, select Design Package to
enter design mode. From the menu "Package" choose Execute:
You will then see the following screen:
You may let the package run for some time until about 50,000 rows
have been inserted into the EmployeesACT table, and manually
cancel out of this operation. After that is done you are ready to
start testing.
- View/modify test parameters: Within ACT, you may right-click
on each test and view its individual properties. In the Properties screen you can modify the
number of simultaneous requests made by ACT, simulating
simultaneous users. To vary the number of rows requested each time
you need to modify each test's last line:
?emprows=NNNN
where NNNN is the number of rows
passed to the ASPX page.
Test Results And Analysis
First of all, please remember that the absolute results presented
here are the ones achieved on the specific setup used. Your results
will very probably vary - however comparative performances should be
the same.
As shown in the diagram earlier, we have divided our tests in
four groups. The cell shading indicates the comparison intention
between tests:
All detailed results can be seen in the download file
Results.xls - here we will present the main findings. The
main performance parameters tested are response speed (represented
by the time taken for the first byte to arrive), and throughput
(represented by requests per second that the page can handle).
Test Groups 1 and 2
The graph below shows the response speed of each method:
variation of the time taken for the first byte to arrive, versus
different simultaneous user connections. The number of rows
requested is 500:
The performance of all methods is quite close for a small number
of rows and hit rate. The methods utilizing a stored procedure have
a distinct advantage (lower time to first byte) over the dynamic SQL
methods, but there is little performance gain by using the SqlDataReader over the DataSet. The graph below shows the effect of
doubling the number of rows requested:
Here we can see that the performance of the SqlDataReader class diverges from the DataSet class as the number of simultaneous
hits increases. This results in up to 30% quicker response by the
SqlDataReader. The time to last byte
presents a similar pattern.
The Requests per second graph for 1000 rows indicates that the
SqlDataReader throughput is much less
affected with increasing hits, than DataSet:
The results of this test group indicate that the SqlDataReader class offers better
scalability, as the response time increases almost linearly with
increasing connections. This is considered very good performance -
to double the capacity of your website you only need to double the
hardware. Let's look at the effect of a more data intensive
application.
Test Groups 2 and 3
The above tests accessed the data of 2 table columns. By
increasing the column number to 10 we see that the SqlDataReader is less affected
proportionately - while the difference between the DataSet and SqlDataReader methods is 11% for 2 columns,
it more than doubles to 24% quicker response for SqlDataReader when we access 10 columns:
The DataSet appears to be as
scalable as the SqlDataReader in this
case. The number of columns accessed does benefit the relative
performance of SqlDataReader, however
not to the degree that an equivalent % increase in rows does.
The data that is used to build the DataList control items is still only the
first two columns, since we are not interested in assessing the
performance of ASPX page processing. Rather, we seek to minimize its
effect, as described in the next test group, where we avoid data
processing altogether.
Test Groups 3 and 4
While the above tests give a realistic picture of how data would
be expected to be processed, it is important to understand the
effect that pure data retrieval and access has on performance. The
fourth group of ACT tests requests the data as in group 3, but the
two column data is simply assigned to a dummy variable. This is
enough to perform the data access and data retrieval, without
performing any additional tasks. Together, these two events
represent an important part of the DataSet / Reader comparative performance. By comparing
the 10 column tests, the % contribution of data access/retrieval to
the total throughput cost increases.
The graph below shows how the throughput is affected by the
removal of data processing within the ASPX page:
Comparing the SqlDataReader trends
we see that at 1000 rows about 25% of the total throughput was being
lost to data processing. In contrast, processing the data that
existed in the DataSet contributed 33%
of the total Requests per sec. The gap in relative performance
increases with increasing amounts of data, until the DataSet page is unable to deliver the
requested data. It is evident that the DataSet subtracts proportionately more from
performance than the SqlDataReader, if
data binding is carried out for example.
Conclusion
From the considerations and technical analysis above we can see
that each class has been designed with quite different applications
in mind. The operational properties of each method are normally
enough to help decide between them, but this article has also
attempted to provide you with a measure of the absolute performance
comparison. Depending on the number of rows, columns, and
simultaneous users, a choice can be made between SqlDataReader and DataSet if both would deliver the desired
functionality. A summary of the performance findings could be
presented as below:
- If your application needs to access a low number of rows
and/or columns of data, then the benefit of a faster data access
via the SqlDataReader may not be
worth the reduced scalability at increased hits/second.
- If your application needs to acquire a large amount of data,
perhaps with ASP.NET control processing, then the cost of a
dedicated connection seems to be outweighed by the fact that the
data arrives (and is processed) faster, therefore giving
better scalability and throughput.
- Scalability of SqlDataReader has
not been an issue with our simple data retrieval statements (in
fact it seems better than for the DataSet), which indicates that only for
complicated data retrieval need that be a concern.
- An increase in the number of accessed columns seems to cause
less delay in response for the SqlDataReader than it does to the DataSet. Certainly, the effect is less for
both classes than the effect an increase in rows has.
Hopefully this has been a useful comparison of the two techniques
that will enable you to meaningfully evaluate the best option for
your application. |