Common table expressions, often referred to as CTEs, are a commonly taught tool used in SQL. In my own experience supporting other developers’ code, CTEs have proven to be both a boon and a bane. They are incredibly useful tools and should be in every developer’s toolbox, regardless of their level of experience. However, like any tool, it should be one of many that are used. It is very easy to misuse CTEs.
My predecessor learned about CTEs and then used them in every query he wrote. Even simple queries where the CTE proved to just be “code sugar.” I have also attended conference sessions on reporting where the speakers insisted that CTEs were mandatory for writing quality code. Since developers of varied levels were involved, that may not have been the best message. I have also experienced the opposite where one, or many, CTEs were appropriate for the query.
CTEs are a tool and in many cases, not the first tool to reach for. I work in the construction industry where knowing the strengths and weaknesses of the tools used is critical to ensure proper usage. Proper tool usage trumps cleverness in every situation, whether the tools are physical or digital.
Quality code has several attributes. Among these are readability and maintainability. Many enterprise-level developers will consider these not just important attributes, but critical attributes. This is especially when the enterprise has multiple developers working on the same code base. These attributes are not just about understanding the code; it is about understanding the intent of the code.
For these purposes, CTEs are an excellent tool. By moving key sub-queries into CTEs, it is easier to read that code and understand the purpose. It is also easier to modify and test the code inside of a CTE. For complicated queries, CTEs could often be seen as building blocks for the complex sections of the query.
CTEs also make join sub-queries easier to maintain. In my case, I often only select a small portion of cumbersome tables. I would create a sub-query for this. If I place the sub-query in a CTE, it is easy-to-find, and if the requirements change, the CTE is easy to modify.
Code reuse is another excellent use of CTEs. There are many cases where I need to use the same sub-query several times. In a case like this, I would use a CTE to encapsulate the repeated sub-query and use the CTE in the place of duplication.
Another situation where a CTE is a good tool is where business rules need to be applied. Imagine a stored procedure querying payroll information where a user can only see a limited set of employees. A CTE could be used to explicitly define the business rules at the beginning of the procedure query. It would be an easy to find piece of code for maintenance.
CTEs are not always the best choice. This is especially true on queries where performance is critical. The SQL Server optimizer handles CTEs by inserting the sub-query inline while compiling the query, repeating the code as needed. This could create poor query performance because CTEs are not materialized during the query in general. There are other SQL tools that can be used and some of these will create materialized data sets.
Additionally, using a CTE with simple queries would be much like using a hammer to kill a fly. It would work. However, it would be massive overkill. Keep simple queries simple. “It makes me look smart” should not be the only reason a CTE is used. There should be additional reasons to use a CTE.
Not all code that is easy to read executes well. Query performance affects the user experience directly. Readability and performance should always be balanced. In my experience, ignoring either will produce a bad user experience.
For this, there are alternatives to using CTEs. Because of how the optimizer handles CTEs, inline sub-queries are always an option. In most cases, there is no performance difference between using a CTE and an inline sub-query.
Temporary tables are a performance-friendly option. These are materialized in the tempdb database. They can be indexed and used like any other SQL table. Like their SQL counterparts, temporary tables must be created and dropped.
Table variables are not always a performance-friendly alternative, but they are a valid alternative. Like a temporary table, the data in a table variable is materialized. However, the optimizer assumes that the table variable only contains a single row. This could lead to skewed execution plans.
Database views are also an alternative to CTEs. In fact, many applications use database views to apply some form of business logic. Performance while using views will be affected as the optimizer treats a view the same as a CTE is treated. The code for the view is included in the query where it is used.
There is an exception to this rule. If the view is indexed, the associated data is materialized. Then, it is treated like the other materialized alternatives. Indexed views have limitations that may not allow their creation.
Each of these alternatives should be a tool in the developer’s toolbox. The developer should also know how each tool is used. This will tell them when the tool should be used.
Let’s look at some examples of the use and the misuse of CTEs. This first example is a query that should use a CTE.
/*
PROBLEM EXAMPLE — Missing CTE where it adds value
Scenario:
- Calculate total sales per (Employee, Customer)
- Find each employee’s top customer
- Keep only those above the overall average
Issue:
- The same aggregation logic is written twice
- Business rules are duplicated
- Any change must be made in multiple places
This is a classic case where a CTE would improve
maintainability and reduce risk.
*/
SELECT e.EmployeeID,
e.LastName,
e.FirstName,
x.CustomerID,
x.TotalSales
FROM Employees e
JOIN
(
SELECT o.EmployeeID,
o.CustomerID,
SUM(od.UnitPrice * od.Quantity * (1 - od.Discount)) AS TotalSales,
ROW_NUMBER() OVER
(
PARTITION BY o.EmployeeID
ORDER BY SUM(od.UnitPrice * od.Quantity * (1 - od.Discount)) DESC
) AS rn
FROM Orders o
JOIN [Order Details] od ON od.OrderID = o.OrderID
GROUP BY o.EmployeeID,
o.CustomerID
) x ON x.EmployeeID = e.EmployeeID
WHERE x.rn = 1
AND x.TotalSales >
(
SELECT AVG(t.TotalSales)
FROM
(
SELECT o2.EmployeeID,
o2.CustomerID,
SUM(od2.UnitPrice * od2.Quantity * (1 - od2.Discount)) AS TotalSales
FROM Orders o2
JOIN [Order Details] od2 ON od2.OrderID = o2.OrderID
GROUP BY o2.EmployeeID,
o2.CustomerID
) t
)
ORDER BY x.TotalSales DESC;
Notice that the aggregation sub-query is repeated twice applying the business rules. This situation is exactly what a CTE is designed for. Here is the same query, using a CTE.
/*
SOLUTION — CTE used for code reuse and clarity
Improvements:
- Aggregation logic defined once
- Clear separation of concerns:
* aggregation
* ranking
* filtering
- Business rules are centralized
Note:
- This improves readability and maintainability
- It does NOT guarantee a different execution plan
*/
WITH SalesByEmployeeCustomer AS
(
SELECT o.EmployeeID,
o.CustomerID,
SUM(od.UnitPrice * od.Quantity * (1 - od.Discount)) AS TotalSales
FROM Orders o
JOIN [Order Details] od ON od.OrderID = o.OrderID
GROUP BY o.EmployeeID,
o.CustomerID
),
RankedCustomerPerEmployee AS
(
SELECT EmployeeID,
CustomerID,
TotalSales,
ROW_NUMBER() OVER
(
PARTITION BY EmployeeID
ORDER BY TotalSales DESC
) AS rn
FROM SalesByEmployeeCustomer
),
OverallAverageSales AS
(
SELECT AVG(TotalSales) AS AvgTotalSales
FROM SalesByEmployeeCustomer
)
SELECT e.EmployeeID,
e.LastName,
e.FirstName,
r.CustomerID,
r.TotalSales
FROM Employees e
JOIN RankedCustomerPerEmployee r ON r.EmployeeID = e.EmployeeID
CROSS JOIN OverallAverageSales a
WHERE r.rn = 1
AND r.TotalSales > a.AvgTotalSales
ORDER BY r.TotalSales DESC;
Notice that the logic is not repeated, is much easier to follow, and the query is easier to read, overall. Another advantage of the CTE here is that changes to the logic only need to be applied once. This makes the query much more maintainable; the intent of the code is clear in the CTE.
Here is an example of where a CTE is not necessary.
/*
MISUSE — CTE as syntax sugar
Characteristics:
- Referenced exactly once
- No reuse
- No complex transformation isolated
- No real readability gain
This CTE adds an extra named layer
without earning its keep.
*/
WITH OrdersWithEmployeeAndCustomer AS
(
SELECT o.OrderID,
o.OrderDate,
o.EmployeeID,
e.LastName,
e.FirstName,
o.CustomerID,
c.CompanyName
FROM Orders o
JOIN Employees e ON e.EmployeeID = o.EmployeeID
JOIN Customers c ON c.CustomerID = o.CustomerID
)
SELECT OrderID,
OrderDate,
EmployeeID,
LastName,
FirstName,
CustomerID,
CompanyName
FROM OrdersWithEmployeeAndCustomer
WHERE OrderDate >= '1997-01-01'
ORDER BY OrderDate,
OrderID;
The base query here is simple enough that the CTE does nothing for readability or maintainability. It makes the query seem like the developer is using a CTE just to use an advanced technique. This is a CTE usage I have seen too many times to count.
/*
PREFERRED FORM — No unnecessary CTE
Same query.
Same results.
Less indirection.
When a CTE adds no reuse or clarity,
it is usually noise.
*/
SELECT o.OrderID,
o.OrderDate,
o.EmployeeID,
e.LastName,
e.FirstName,
o.CustomerID,
c.CompanyName
FROM Orders o
JOIN Employees e ON e.EmployeeID = o.EmployeeID
JOIN Customers c ON c.CustomerID = o.CustomerID
WHERE o.OrderDate >= '1997-01-01'
ORDER BY o.OrderDate,
o.OrderID;
Without the CTE here, the code is just as easy to read. There is no code reuse to address. On top of that, the code is easier to debug and test. There is also less code where the style can drift from styling standards.
While not all enterprises establish styling standards for code, here are a few notes on styling your code. Quality code is readable as well as maintainable. Here are a few suggestions for SQL code to make it readable.
- Whitespace is your friend. Use it a lot!
- Align and indent your code blocks. It helps you organize your logic.
- Uppercase keywords. Words in all uppercase jump off the page when trying to figure out what you wrote.
- Use PASCAL or Camel case for variable and alias names. This is a good practice regardless of the language you use.
- Don’t align commas, align your logic. I don’t care how SSMS aligns the code it generates, aligned commas make your code hard to read.
- All editors are not created equal; use spaces not tabs for your alignment. When you upgrade or change editors, your spacing will be really weird if you mix the two.
I always write multiple versions of queries with CTEs that will be used in production. This allows me to test for accuracy and performance. I also test sub-queries independently of the full query. Enterprise-level queries and the associated reports are often used to make significant, corporate decisions. Your users’ experience will rely on accuracy and performance.
I hope you found this post valuable. Common table expressions are powerful tools. I strongly suggest you learn to use them, but use them properly.
Russel, still coding.