Wrox Press
ASPToday
       989 Articles
  in the Solutions Library
  Log off
 
 
 
ASPToday Subscriber's Article Costas Hadjisotiriou
ADO.NET Performance
by Costas Hadjisotiriou
Categories: Data Access, Performance, .NET Framework
Article Rating: 3.5
Published on November 28, 2002
 
Content Related Links Discussion Comments Index Entries Downloads
 
Abstract
This article will discuss performance aspects of two data access methods in ADO.NET: the disconnected SqlDataAdapter/DataSet method and the connected SqlDataReader method. Each of these requires different amounts of overhead for acquiring the same data. Therefore, the choice between the two affects performance, scalability, maintainability, and usability, and warrants careful consideration.

Costas Hadjisotiriou will carry out a qualitative and quantitative examination of each method to deduce when they are best used.
 

Article Information
Author Costas Hadjisotiriou
Chief Technical Editor John R. Chapman
Project Manager Helen Cuthill
Reviewers Saurahb Nandu & J. Boyd Nolan

 
Article

Introduction

To enable a realistic assessment of each data access class we would ideally be able to isolate the effects of .NET data processing from all external factors. Such external factors can be network latency, client overload, web server overload, or even factors within the database e.g. utilization of indexes, parsing of SQL statements etc. Unfortunately, while we can measure some (but not all) of these items directly, it is much easier to simply try to minimize their relative influence on our measurements and ensure that the effect remains constant for all tested methods. A detailed description of our methodology is shown under the "Testing considerations" heading.

Before the quantitative testing however, we will try and identify situations where a choice between access methods can be made by considering the whole application environment and data usage. The result will be a flow diagram that can help you take the right architectural decision.

System Requirements

To run the code supplied you will need:

  • The .NET Framework version 1.0
  • SQL Server 2000
  • Microsoft Application center Test (ACT) - part of Visual Studio .NET Enterprise

Testbed setup

  • Hardware: DELL Dimension 4300, CPU: Intel Pentium 4 1.6GHz, RAM: 130,064 KB, 20GB disk
  • Software: Windows2000 Server SP2 OS
  • Environment settings: system optimized for background processes, IIS5 optimized for >100,000 hits/day

Qualitative comparison

Strictly speaking, when you see references to DataReader in documents, the class implied is SqlDataReader. It is the functionality of that class that we will be comparing to the DataSet class. A diagram will make the subsequent definitions clearer:

As you can see, the bridge between a DataSet and SQL Server is the SqlDataAdapter, which enables the retrieving and saving of data; a two-way process. The DataSet can contain more than one DataTable, with constraints and relations between DataTables. It can therefore be imagined as a mini-database in memory. In contrast, The SqlDataReader provides a forward-only, read-only stream of data from the database, which does not get inserted in any table structure. To create a SqlDataReader, you must call the ExecuteReader method of the SqlCommand object, rather than directly using a constructor.

There are occasions where the choice between using the DataSet or SqlDataReader class should be clear - after all, they were designed with different specifications in mind. To determine which one to use when you design your application, start by considering the functionality that is needed in the application.

Use the DataSet in order to do the following with your application:

  • Manipulate data from multiple sources (for example, a mixture of data from more than one database, from an XML file, and from a spreadsheet).
  • Use or modify the data's relationships. If the relationship itself is not important then you can consider bringing tables separately with a SqlDataReader, however their data will not be connected. You could also conceivably use a stored procedure or Views to bring together one-to-one/one-to-many relationship data (less trips from web server to db server), but more data will be transmitted.
  • Exchange data between tiers or using an XML Web service. Unlike the SqlDataReader, the DataSet can be passed to a remote client.
  • Reuse the same set of rows to achieve a performance gain by caching them (such as for sorting, searching, or filtering the data).
  • Perform a large amount of processing per row. Once the SqlDataReader starts acquiring the data, no other operation can be performed on the SqlConnection other than closing it. Extended processing on each row returned using a SqlDataReader ties up the connection serving the SqlDataReader exclusively. The sharing of connections with the SqlDataAdapter means that a benefit may be derived by using the DataSet if much processing per row is to be performed.
  • Needs scalability as an absolute must. As noted above, holding the connection exclusively makes the application more difficult to scale up because connection contention might occur. However, see the last point in SqlDataReader paragraph below.
  • Manipulate data using XML operations such as Extensible Stylesheet Language Transformations (XSLT transformations) or XPath queries. However, if you just intend to present the data as XML then you might want to consider using the SqlDataReader with a stored procedure that executes a "SELECT FOR XML" statement.

Use the SqlDataReader in your application if you:

  • Do not need to cache the data, and processing is done on the fly
  • Are processing a set of results too large to fit into memory
  • Need to quickly access data once, in a forward-only and read-only manner, for example loading up a list of items to present in a listbox or combo box. Usually, the database server is more suited to retrieving data, so this should be done in combination with stored procedures, if possible.
  • Need minimal Web server CPU usage as an absolute must. The SqlDataAdapter actually uses the SqlDataReader to fill a DataSet. Therefore, the performance gained by using the SqlDataReader instead of the DataSet is all the extra processor cycles consumed to fill the DataSet - instead the data is fed directly to your application.
  • Holding the connection exclusively implies connection contention when scaled up. However, this is conditional upon the complexity of the actual data retrieval, i.e. the faster retrieval of the SqlDataReader might actually result in better scalability because the connection is held for much less time.

In life nothing is ever clear-cut - there certainly are situations where both data access methods would be applicable. For example:

  • If you are looking for a specific value from a specific row multiple times (assuming this value cannot be processed via SQL statements). You can use multiple SqlDataReader calls that quickly close the connections, or open a DataSet, create an index in memory, and search for all the wanted values in disconnected mode.
  • If reading the data is part of some larger operation that may be better performed by opening a DataSet after a SqlDataReader, or just a DataSet right from the beginning.

For these sorts of situations it is useful to have a quantitative assessment of each individual method's performance. It is important to emphasize, however, that absolute performance should only form part of the wider consideration - business rules, maintenance aspects or simply functionality might well be overriding factors in your decision-making.

Testing Considerations

Although strictly speaking, data access techniques should be tested directly, rather than behind a Web server, the latter is realistic for common application scenarios. In any case, because we are comparing relative performance of these data access techniques, the overhead of testing with a Web server will be same for both, and it makes testing via Microsoft Application Center Test much easier. Note that we are not trying to minimize data access time per se, so we are not using optimized code such as exposing a stored procedure as an XML web service for example. Ideally however, we need to isolate the DataSet / Reader effects, so need to maximize their relative % effect, i.e. minimize random network latency, effects deriving from SQL statement parsing, unrelated ASP.NET operations etc.. In practical terms, this implies the following:

Minimizing database server load:

  • Use stored procedures to ensure that the SQL statement is precompiled
  • Select only a small number of columns in our SELECT statements to avoid delays caused by any migrated rows
  • Make sure that at least one of these columns is indexed to enable faster retrieval

Minimizing web server load:

  • Optimize Windows 2000 for background processes (under Control Panel | System | Advanced | Performance)
  • Optimize IIS for >100,000 hits per day (under Internet Services Manager | Properties | Performance).
  • Run each ASPX page once using IE before each test is run to compile it. This simply reduces the "warm-up" period so that less of it may be needed.

Minimizing time for establishing database connections:

  • Modify the initial connection pool size (default = 100) to the number of simultaneous connections needed, if it is a higher number. Minimum pool size is not required, since ACT can allow for warm-up time.

Minimizing data transmission time:

  • Use SQL Server, which enables direct access via the native SQLClient Managed Provider.
  • Use same machine as the client, web server and database server. This gives >2Mb/s bandwidth, minimizing inter-tier communication-related delays. However it is important to keep the CPU load low (~90%).

Minimize unnecessary ASP.NET work:

  • Ensure you are not running in Debug mode or writing to any unneeded logs
  • Simulate data access within the memory structures by attaching the row values to a DataList control, but keeping this processing to a minimum. Compare this performance to iterating through the data values without assigning them to anything - an unrealistic scenario but one that can show what part of the total time is spent on filling up the DataSet / Reader with data and then accessing them.

There are a number of additional tips (not used in our tests) that may be applicable if you are carrying tests on your application, or simply want to improve performance:

  • Use sequential access of the SqlDataReader. By default, the SqlDataReader loads an entire row into memory with each Read. This allows for random access of columns within the current row. If this random access is not necessary, for increased performance, pass CommandBehavior.SequentialAccess to the call to ExecuteReader. This changes the default behavior of the SqlDataReader to only load data into memory when it is requested. Note that once you have read past a returned column, you can no longer read its value.
  • Call Cancel on a SqlDataReader operation that still has pending rows to be fetched. Otherwise, calling close on the connection will first finish off any pending tasks and then close the connection.

Our testing setup

Our tests will examine the variation of Response time and Requests Per Second as a function of rows requested, columns accessed, amount of processing done within the requesting ASP.NET page, and whether a stored procedure is used or not. The diagram below shows the overall test structure:

Let's look at the steps that will reproduce the testing setup on your system:

  • Copy the ASP.NET pages: Place the ASP.NET pages in a directory of your choice under the IIS root (preferably a subdirectory called "perf" to avoid changing the ACT scripts).
  • Open the ACT test file: In the article download you will find the ADONETPerformance.zip file. Unzip it anywhere in your system and open the ADONETPerformance.act file from within ACT. By clicking on each test you will see the VBScript code in the right bottom frame - here is the code for the test "DataReaderDirect":
Dim URL
Dim UB, LB
' Set the upperbound for rows
UB = 50000
' Set the lowerbound for rows
LB = 1
' Set the URL
URL  = "http://localhost/perf/testDataReaderDir.aspx"
' Use the Randomize function to initialize the Rnd function
Randomize
Test.SendRequest(URL & "?emprows=1500&empid=" & int((UB - LB + 1)*Rnd + LB))

The test defines an upper and lower limit for generating a random number. This number is sent as the "empid" URL parameter to the ASPX page, and in turn to the stored procedure if used. The ASPX page takes the "empid" value and requests rows that have an EmployeeID value higher than the one given. This introduces an element of randomness in which rows are actually returned.

Before running the tests, you will need to modify the initialization parameter "URL" to point to the location of the ASP.NET pages copied in the previous step.

  • Create the test data table from which the rows of data will be requested in our tests: Copy the SQL script files "createdbtable.sql", "createspemp.sql", "createspempstar.sql" to a directory of your choice. Open the "SQL Profiler" program from the Microsoft SQL Server program group, and open the SQL script "createdbtable.sql" as shown below:

The SQL Profiler will allow you to run the script step by step - you can execute it in its entirety or modify it if you wish. It will create the structure of a table named "EmployeesACT" identical to the existing "Employees" table under the "Northwind" database:

  • Create the stored procedures: Following the same process as in the previous step, you can create the "[ dbo].[emp]" and "[dbo].[empstar]" stored procedures using the other two SQL scripts, "createspemp.sql", "createspempstar.sql". You will need to edit them so that their Execute privilege is assigned to the ASP.NET client on your server. By opening any of the scripts you can see that the last statement is:
GRANT  EXECUTE  ON [dbo].[emp]  TO [your_server_name_here\ASPNET]
GO
  • Modify both SQL script files to have the "your_server_name_here" become your server's identifying name. You can then execute the script files within SQL Profiler as noted above.
  • Fill the test table with data: The download includes a Data Transformation Package , "ACTimportdb.dts", that you can run to fill the test table with sample data. It copies the existing data from the existing Northwind Employees table multiple times - if you prefer using your own scripts for doing this you can skip this step. Otherwise, in the SQL Server Enterprise Manager console tree, right- click Data Transformation Services, and then click Open Package. In the Select File dialog box, browse to the ACTimportdb.dts file, and then click Open. Right-click on the package, select Design Package to enter design mode. From the menu "Package" choose Execute:

You will then see the following screen:

You may let the package run for some time until about 50,000 rows have been inserted into the EmployeesACT table, and manually cancel out of this operation. After that is done you are ready to start testing.

  • View/modify test parameters: Within ACT, you may right-click on each test and view its individual properties. In the Properties screen you can modify the number of simultaneous requests made by ACT, simulating simultaneous users. To vary the number of rows requested each time you need to modify each test's last line:
?emprows=NNNN

where NNNN is the number of rows passed to the ASPX page.

Test Results And Analysis

First of all, please remember that the absolute results presented here are the ones achieved on the specific setup used. Your results will very probably vary - however comparative performances should be the same.

As shown in the diagram earlier, we have divided our tests in four groups. The cell shading indicates the comparison intention between tests:

All detailed results can be seen in the download file Results.xls - here we will present the main findings. The main performance parameters tested are response speed (represented by the time taken for the first byte to arrive), and throughput (represented by requests per second that the page can handle).

Test Groups 1 and 2

The graph below shows the response speed of each method: variation of the time taken for the first byte to arrive, versus different simultaneous user connections. The number of rows requested is 500:

The performance of all methods is quite close for a small number of rows and hit rate. The methods utilizing a stored procedure have a distinct advantage (lower time to first byte) over the dynamic SQL methods, but there is little performance gain by using the SqlDataReader over the DataSet. The graph below shows the effect of doubling the number of rows requested:

Here we can see that the performance of the SqlDataReader class diverges from the DataSet class as the number of simultaneous hits increases. This results in up to 30% quicker response by the SqlDataReader. The time to last byte presents a similar pattern.

The Requests per second graph for 1000 rows indicates that the SqlDataReader throughput is much less affected with increasing hits, than DataSet:

The results of this test group indicate that the SqlDataReader class offers better scalability, as the response time increases almost linearly with increasing connections. This is considered very good performance - to double the capacity of your website you only need to double the hardware. Let's look at the effect of a more data intensive application.

Test Groups 2 and 3

The above tests accessed the data of 2 table columns. By increasing the column number to 10 we see that the SqlDataReader is less affected proportionately - while the difference between the DataSet and SqlDataReader methods is 11% for 2 columns, it more than doubles to 24% quicker response for SqlDataReader when we access 10 columns:

The DataSet appears to be as scalable as the SqlDataReader in this case. The number of columns accessed does benefit the relative performance of SqlDataReader, however not to the degree that an equivalent % increase in rows does.

The data that is used to build the DataList control items is still only the first two columns, since we are not interested in assessing the performance of ASPX page processing. Rather, we seek to minimize its effect, as described in the next test group, where we avoid data processing altogether.

Test Groups 3 and 4

While the above tests give a realistic picture of how data would be expected to be processed, it is important to understand the effect that pure data retrieval and access has on performance. The fourth group of ACT tests requests the data as in group 3, but the two column data is simply assigned to a dummy variable. This is enough to perform the data access and data retrieval, without performing any additional tasks. Together, these two events represent an important part of the DataSet / Reader comparative performance. By comparing the 10 column tests, the % contribution of data access/retrieval to the total throughput cost increases.

The graph below shows how the throughput is affected by the removal of data processing within the ASPX page:

Comparing the SqlDataReader trends we see that at 1000 rows about 25% of the total throughput was being lost to data processing. In contrast, processing the data that existed in the DataSet contributed 33% of the total Requests per sec. The gap in relative performance increases with increasing amounts of data, until the DataSet page is unable to deliver the requested data. It is evident that the DataSet subtracts proportionately more from performance than the SqlDataReader, if data binding is carried out for example.

Conclusion

From the considerations and technical analysis above we can see that each class has been designed with quite different applications in mind. The operational properties of each method are normally enough to help decide between them, but this article has also attempted to provide you with a measure of the absolute performance comparison. Depending on the number of rows, columns, and simultaneous users, a choice can be made between SqlDataReader and DataSet if both would deliver the desired functionality. A summary of the performance findings could be presented as below:

  • If your application needs to access a low number of rows and/or columns of data, then the benefit of a faster data access via the SqlDataReader may not be worth the reduced scalability at increased hits/second.
  • If your application needs to acquire a large amount of data, perhaps with ASP.NET control processing, then the cost of a dedicated connection seems to be outweighed by the fact that the data arrives (and is processed) faster, therefore giving better scalability and throughput.
  • Scalability of SqlDataReader has not been an issue with our simple data retrieval statements (in fact it seems better than for the DataSet), which indicates that only for complicated data retrieval need that be a concern.
  • An increase in the number of accessed columns seems to cause less delay in response for the SqlDataReader than it does to the DataSet. Certainly, the effect is less for both classes than the effect an increase in rows has.

Hopefully this has been a useful comparison of the two techniques that will enable you to meaningfully evaluate the best option for your application.

 

Please rate this article using the form below. By telling us what you like and dislike about it we can tailor our content to meet your needs.

 
 
Rate this Article
How useful was this article?
Not useful Very useful
Brief Reader Comments: Read Comments
Your name (optional):
 
 
Content Related Links Discussion Comments Index Entries Downloads
 
Back to top