Nov 23, 2011

Retrieving the First N Records from a SQL Query


Introduction
There are often times when retrieving results from a database that you only are interested in displaying the first N records from the resulting query. For example, a news website might have a database table that contains a record for each story. On the news website's homepage, they may want to display the 10 most recent stories from this table. Doing a SELECT * FROM NewsStoriesTable SQL query will return all of the records from that table, no just the 10 most recent ones. So how do they display just the 10 most recent?
There are a couple ways to display the first N records from a query. The most naive way is to simply return all of the records to the web page and then use programmatic logic to only show the first N. This is a poor approach because it requires that the database query all records from the table and transfer all of the data from the database to the web page when only a small subset of the total results will even be used. This approach's performance will worsen as the NewsStoriesTable grows over time.
A better approach is to use SQL Server's ROWCOUNT and/or TOP keywords. These two keywords actually limit the results SQL server queries and returns, thus reducing the burden both on retrieving the results from the database and returning them to the web page. In this article we'll look at both ROWCOUNT and TOP and see when to use each one.

Limiting Records with TOP
The TOP keyword was introduced with SQL Server 7.0 and provides a mean to limiting the results of a SQL SELECT query. With TOP you can specify to retrieve the first Nrecords or the first X percent of records. The syntax for TOP is as follows:

SELECT TOP N select list
FROM TableName

              -- or --

SELECT TOP X PERCENT select list
FROM TableName


      -- Concrete examples --
SELECT TOP 5 *
FROM authors

SELECT TOP 25 PERCENT au_lname, au_fname
FROM authors
WHERE zip = '92101'


Here N and X must be positive integer values; if you are using the second pattern, X must be between 0 and 100, inclusive.

Using TOP will access just the first N or X percent of records. Commonly such queries are accompanied by ORDER BY clauses. For example, in the example discussed earlier with a news website, to return the 10 most recent news articles you'd need to use an ORDER BY clause to return the results sorted by the PublicationDate field in descending order. From this, you'd want just the top 10 results:

SELECT TOP 10 NewsURL, ArticleTitle, ArticleAuthor
FROM NewsStoriesTable
ORDER BY PublicationDate DESC


Limiting Records with ROWCOUNT

Prior to SQL Server 7.0, the only way to limit results from a SQL query was to use the ROWCOUNT variable. With ROWCOUNT you'd proceed the actual SQL statement with a line like:


SET ROWCOUNT N


After which you could have your SQL statement. By setting ROWCOUNT to some number greater than zero, all subsequent SQL statements in the scope would process only the first N records.
ROWCOUNT still works with modern versions of SQL Server, and does have its time and place. Keep in mind, though, that ROWCOUNT applies not only to SELECT statements, but also to INSERTUPDATE, and DELETE statements as well. Additionally, ROWCOUNT's effects apply to triggers that may be firing as well.
Here's a simple example of using ROWCOUNT to retrieve the 10 most recent articles from the NewsStoriesTable table:

SET ROWCOUNT 10
SELECT NewsURL, ArticleTitle, ArticleAuthor
FROM NewsStoriesTable
ORDER BY PublicationDate DESC


When to Use TOP vs. When to Use ROWCOUNT

So what approach should you use, TOP or ROWCOUNT? If you are using SQL Server 6.5 or earlier, you'll have to use ROWCOUNT, as TOP wasn't introduced until SQL Server version 7.0. Of course, unless you are supporting a legacy system, chances are you are using at least SQL Server 7.0, if not SQL Server 2000 or SQL Server 2005.
In SQL Server 2005, TOP can do everything ROWCOUNT can. The TOP value can be a parameter, it can be applied to INSERTUPDATE, and DELETE statements. In his book SQL Server 2005 T-SQL Recipes, Joseph Sack advises:
"In previous versions of SQL Server, developers used SET ROWCOUNT to limit how many rows the query would return or impact. In SQL Server 2005, you should use the TOP keyword instead of SET ROWCOUNT as the TOP will usually perform faster."
For some examples of using TOP in SQL Server 2005, see "TOP" Clause in SQL Server 2005.
In SQL Server 7.0 and 2000, TOP does not have all the functionality SET ROWCOUNT offers (i.e., applying to non-SELECT statements, effecting triggers, etc.). But TOP works in a more straightforward manner, in my opinion. For example, with simple SELECTs you won't have any issues using SET ROWCOUNT, but if you are calling a stored procedure that does a SELECT but also processes other statements, with SET ROWCOUNT you'll need to be more careful to ensure that your SET ROWCOUNT N statement applies only to those queries intended.

With SQL Server 7.0 and 2000 there is one case where ROWCOUNT may be preferred in a SELECT query: when you want to let the web page developer specify how many results to return. For example, imagine you have a stored procedure named getRecentArticles that takes in as an integer input parameter named @ResultCount. Now, the stored procedure should return only the @ResultCount most recent articles (rather than always returning the 10 most recent). This allows the page developer to indicate if she wants to get back the 10 most recent articles or, say, the 25 most recent articles. Say we have the following:

CREATE PROCEDURE getRecentArticles
(
 @ResultCount   int
)
AS
  ... TODO: Write SQL statement to return the @ResultCount
               most recent news articles ...


This can be accomplished in SQL Server 7.0 and SQL Server 2000 using TOP only if you use dynamic SQL in your stored procedure. That is, you cannot do the following:

CREATE PROCEDURE getRecentArticles
(
 @ResultCount   int
)
AS

SELECT TOP @ResultCount NewsURL, ArticleTitle, ArticleAuthor
FROM NewsStoriesTable
ORDER BY PublicationDate DESC


Instead you will need to dynamically craft a string in the stored procedure and then execute it using EXEC or sp_executesql. This approach, while possible, tends to lead to messy and less maintainable SQL stored procedures, in my opinion. (If you are interested in learning more on using dynamic SQL in your stored procedures be sure to read my earlier article, Using Dynamic SQL Statements in Stored Procedures.)

With ROWCOUNT, however, you can just slap in the @ResultCount input parameter like so:

CREATE PROCEDURE getRecentArticles
(
 @ResultCount   int
)
AS

SET ROWCOUNT @ResultCount

SELECT NewsURL, ArticleTitle, ArticleAuthor
FROM NewsStoriesTable
ORDER BY PublicationDate DESC


Finally, if you are wanting to limit the results performed by an INSERTUPDATE, or DELETE, you'll have to use ROWCOUNT as TOP only applies to SELECT statements.

Conclusion

In this article we examined how to return the first N records from a SQL query. Specifically we looked at two SQL Server keywords, TOP and ROWCOUNT, examining sample SQL syntax for each. Finally, we compared and contrasted the two, looking at when to TOP versus when to use ROWCOUNT.

0 comments:

Post a Comment

Your comments:

Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More