Tuesday, August 25, 2009

Uncommon Common Table Expressions (CTEs)

Common table expressions (CTEs) were introduced with SQL 2005. They are a powerful new construct for building advanced queries.

The Microsoft CTE Article describes 4 main uses for CTEs:

  • Recursive queries
  • Inline views
  • Grouping by derived columns
  • Reuse of the same table

It still amazes me how many people on the new SQL platform actually aren't familiar with or don't use CTEs. I'm going to share two quick-and-dirty examples to demonstrate just how powerful they can be.

Hierarchical Data

I've blogged a bit about hierarchical data in the past (read Hierarchical Data Templates in Silverlight for an example). SQL makes it far easier to retrieve data in a hierarchical fashion using CTEs.

Let's assume you have a content management site with articles, and articles can be grouped into hierarchical categories (i.e. one category may contain others). There are several ways to approach this, but one way is to have the main category table referenced by another that encapsulates the relationships, like this:

Category Example

We'll start with an Articles section that has Whitepapers as a subcategory, which also has "Technology Related" as a subcategory, then add a FAQ which has "Using the Site" and "Logging Out" as subcategories. Our main table looks like this:

CategoryID Category
1Articles
2FAQ
3Whitepapers
4Using the Site
5Logging Out
7Technology Related

To show the relationships, we'll enter records into CategoryLevel like this:

Parent Child
13
24
25
37

One real-world challenge would be showing a breadcrumb when the user is viewing a technology article. Given the child node (7, Technology Related) how do we recursively traverse the hierarchy to show all of the parent nodes?

Using a common table expression, we would alias the "top" of our query (the selection of the current category) and then join that to recursively traverse the tree, like this:

with RecursiveCat(CategoryID,Category) 
 as (
  select
    CategoryID,
    Category
  from dbo.Categories
  where CategoryID = 7
  union all 
  select
    L.ParentCategoryID as CategoryID, 
    Categories.Category 
  from dbo.CategoryLevels L 
  join RecursiveCat c 
    on c.CategoryID = L.ChildCategoryID
  inner join dbo.Categories 
    on L.parentCategoryID = Categories.CategoryID
 )
 select * from RecursiveCat

The key here is to note the "union all" with the join. The top query "kicks it off" and selects the target node. The union describes a recursive template ... on the inside, where we reference the CategoryLevels table that describes the parent-child relationships, we join to the Common Table Expression (CTE) on the outside. This means for each record, it joins back to itself and "unrolls" until the full hierarchy is exposed.

Running the above query yields:

CategoryID Category
7Technology Related
3Whitepapers
1Articles

As it walks up the tree.

Ranked Data

The other most common use is for ranked data. Let's take the above example even further, and declare an articles table:

The articles simple belong to a category and have a date when they were published. I'm leaving out details like title, body, etc for now.

A very common way to feature articles on a content website is to show the most recent article published to a given category. Assuming you have dozens of articles, what is the easiest way to do that?

Of course, if you are pragmatic, you'll probably put a MostRecentArticleID column on the Categories table or even introduce a new table to hold that. However, if you want to have fun (and let me keep my excuse for giving an example) you can use a rank function (we're not yet to the Common Table Expression, so bear with me). For now, try to ignore my typo as ArticlePublishData should be ArticlePublishDate. The first query will give you an idea of how the RANK works:

SELECT CategoryID, ArticleID, ArticlePublishData,
     ROW_NUMBER() OVER (PARTITION BY CATEGORYID ORDER BY ArticlePublishData DESC) As "MyRank"
   FROM Articles 

When we run this, we get all of the articles, but if you notice, the "MyRank" column resets for each category. Within a category, it starts at the most date date with "1" and then increments. This is because we are partitioning, or telling the CTE to reset the ranking, by the categories, and then ordering within those categories by the publish date. In this example, both 1 and 2 are in the same category, then 3 is in a new category.

CategoryIDArticleID ArticlePublishData Rank
122008-12-13 00:00:00:00 1
112006-12-14 00:00:00:00 2
232009-01-02 00:00:00:00 1

Now, it's just a simple matter to add a filter using a Common Table Expression. We'll alias the ranking function to extract the row, and then filter on the row:

;with CTE(CategoryID, ArticleID, ArticlePublishData, Row) AS (
SELECT CategoryID, ArticleID, ArticlePublishData,
     ROW_NUMBER() OVER (PARTITION BY CATEGORYID ORDER BY ArticlePublishData DESC) As "MyRank"
   FROM Articles) 
SELECT * from CTE
where Row = 1

This will give us only the most recent article for each category.

Hopefully this has been a useful, quick, and easy introduction to Common Table Expressions. Just search online for the phrase and you'll find plenty of examples and tutorials to help you take your SQL skills to the next level.

Jeremy Likness