Table expressions are queries that represent a relational table in SQL Server. They can be interacted with in the same way you would do a normal table. Table expressions are not persistent objects in the database. When they get used within SQL statements the underlying objects making up the table expression can be inlined within the statement. Depending on the complexity of the table expressions this can result in effective or terrible execution plans. As a table expression represents a relation it must adhere to a few rules:
- The table expression must provide column names for each column specified;
- The table expression cannot have two columns with the same name;
- An ORDER BY clause is not allowed unless the TOP or OFFSET-FETCH is used. SQL is based on multiset theory and a set has no order.
SQL Server supports four different types of table expression; a derived table, common table expression also known as a CTE, view, and an inline table-value function also known as ITVF. Table expressions are commonly used to simplify complex sections of code into logical parts and provide code reuse.
Derived Table
A derived table is a subquery with a name and is specified in the FROM clause of a SQL statement. Because it is defined in a SQL statement it is only available within the scope of the statement that defined it. Derived tables can be useful as you can include complex logic within the derived table and refer to the columns alias of the derived table directly in the outer query.
To showcase let’s start with a simple query which doesn’t use a derived table. We will use the StackOverFlow2010 database as our source. The query counts the number of comments made by users who have put their location as Africa.
USE StackOverflow2010
GO
SELECT
u.id,
u.DisplayName,
COUNT(c.UserId) AS CommentNumber
FROM
dbo.users u
LEFT JOIN dbo.Comments c ON u.Id = c.UserId
WHERE
u.location IN ('Africa')
GROUP BY
u.Id,
u.DisplayName
ORDER BY
u.DisplayName;
There are five users returned with a varying number of comments.
We can refactor this query to use a derived table. We can replace the LEFT JOIN to the comment table and instead write a subquery to work out everything we need for the comment table. This removes the need for the GROUP BY on the outer query as we have already aggregated the number of comments by user.
USE StackOverflow2010
GO
SELECT
u.id,
u.DisplayName,
COALESCE(CommentNumber, 0) AS CommentNumber
FROM
dbo.users u
LEFT JOIN (
SELECT
UserId, count(*) AS CommentNumber
FROM
dbo.Comments
GROUP BY
UserId
) c ON u.id = c.userid
WHERE
u.location IN ('Africa')
ORDER BY
u.DisplayName;
The query returns the exact same data.
Let’s see what happens when we don’t adhere to the rules we described at the beginning of the article. The first rule specifies you must provide a column alias for each column in the derived table. In the query I have removed the column alias for the Count(*) within the derived table.
SELECT
u.id,
u.DisplayName
FROM
dbo.users u
LEFT JOIN (
SELECT
UserId,
count(*) -- No Column Alias
FROM
dbo.Comments
GROUP BY
UserId
) c ON u.id = c.userid
WHERE
u.location IN ('Africa')
ORDER BY
u.DisplayName;
If we execute this statement we get the following error.
No let’s specify a column twice in the derived table.
SELECT
u.id,
u.DisplayName,
COALESCE(CommentNumber, 0) AS CommentNumber
FROM
dbo.users u
LEFT JOIN (
SELECT
-- Specified UserId Twice
UserId, UserId,
count(*) AS CommentNumber
FROM
dbo.Comments
GROUP BY
UserId
) c ON u.id = c.userid
WHERE
u.location IN ('Africa')
ORDER BY
u.DisplayName;
This time we get a different error.
Finally let’s add an ORDER BY into the derived table.
SELECT
u.id,
u.DisplayName,
COALESCE(CommentNumber, 0) AS CommentNumber
FROM
dbo.users u
LEFT JOIN (
SELECT
UserId, count(*) AS CommentNumber
FROM
dbo.Comments
GROUP BY
UserId
-- added an ORDER BY into the INNER statement
ORDER BY
UserId
) c ON u.id = c.userid
WHERE
u.location IN ('Africa')
ORDER BY
u.DisplayName;
Once again the database throws back an error.
Common Table Expressions (CTE)
Like a derived table a CTE is a named query which can be used within your SQL statements. Just like the derived table it is only available within the scope of the statement that defined it. Instead of having the query defined within the FROM statement like the derived table we define it at the start of our query using the WITH operator. Each CTE statement must be given a name. We are allowed to define any number of CTE statements under the WITH clause. We separate these statements with a comma and each statement is enclosed in brackets.
Let’s start by rewriting the queries from the derived tables example using a CTE.
USE StackOverflow2010
GO
;WITH c AS (
SELECT
UserId, count(*) AS CommentNumber
FROM
dbo.Comments
GROUP BY
UserId
)
SELECT
u.id,
u.DisplayName,
COALESCE(CommentNumber, 0) AS CommentNumber
FROM
dbo.users u
LEFT JOIN c ON u.id = c.userid
WHERE
u.location IN ('Africa')
ORDER BY
u.DisplayName;
The query starts with the WITH command and then the alias of our CTE, c in this case. Within brackets we specify our query. After this section we start our SQL SELECT statement as normal. When we come to the FROM section we can join to the CTE using the alias we specified. People will sometimes write ;WITH instead of WITH. The WITH operator in SQL needs to have the previous statement ended correctly with the semi-colon. Since a lot of people do not use the semi-colon with any consistency this ensures that an error is not raised if a in-properly formed statement is added before the WITH statement.
CTE vs Dervied Table
In terms of performance there is no difference between a CTE and a derived table. SQL is a declarative language and the database engine will just in-line either one. If we look at the execution plan of the CTE and the derived table queries the plans are exactly the same.
This begs the question does it matter if I write using a CTE or a derived table. In my opinion the CTE structure has multiple advantages over a derived table:
- Statement reuse. Sometimes you will want to reference a named subquery more than once. With a derived table you would have to write the statement out each time you want to use it. In the future if you had to change that derived table you would need to change it each time it was specified making maintenance a bigger issue. With a CTE you can define it once at the top and reference it any number of times you want through your query. It is important to note that each time you reference the CTE the full SQL statement within the CTE will be run. It does not run once and persist the results for each call.
- Readability. Defining all of our named subqueries at the top of the statement makes for easier readability. It allows us to modularise our statements into small specific sections.
- Nesting Statements. If you nest multiple derived tables the code would look messy, complex, and difficult to read. Using CTE’s we can just define the statements one after another with the need for indentation.
USE StackOverflow2010;
GO
-- Nested query written using derived tables
SELECT inner_third.*
FROM (
SELECT inner_second.*
FROM (
SELECT inner_first.*
FROM (
SELECT TOP (5) * FROM dbo.Users
) AS inner_first
) AS inner_second
) AS inner_third;
GO
-- Nested query written using CTE's
WITH inner_first AS (
SELECT TOP (5) * FROM dbo.users
),
inner_second AS (
SELECT * FROM inner_first
),
inner_third AS (
SELECT * FROM inner_second
)
SELECT * FROM inner_third;
GO
Recursive CTE
CTE’s also come in a second form; a recursive CTE. This is something you cannot do with a derived table. A recursive CTE is a CTE that references itself. By doing so the CTE repeatedly executes until the recursive definition is complete. This is useful for querying hierarchical data such as manager -> employee organisation structures where the hierarchy can be ragged.
Let’s see an example in action that looks at a series of employees. Each employee has a manager unless they are the top dog where they don’t report to anyone. We will use the CTE to work out the level of management each employee sits at.
IF OBJECT_ID('tempdb..#RecursiveCTEExample') IS NOT NULL
DROP TABLE #RecursiveCTEExample;
CREATE TABLE #RecursiveCTEExample (
EmployeeId int NOT NULL,
EmployeeName varchar(30) NOT NULL,
ManagerId int NULL
);
INSERT INTO #RecursiveCTEExample (
EmployeeId, EmployeeName, ManagerId
)
VALUES
(1, 'Peter Smith' , NULL),
(2, 'Adam Ant' , 1),
(3, 'Emily Holmes' , 1),
(4, 'Sandra Taylor' , 2),
(5, 'Louise Askew' , 4),
(6, 'Pierre Flowers' , 4),
(7, 'Susan Thompson' , 4),
(8, 'Alex McQueen' , 7),
(9, 'Cattie Palmer' , 7);
The recursive CTE has two SELECT sections which must be connected by a SET operator (in this case UNION ALL). The first SELECT is known as the anchor member. This SELECT gets called the first time the query is run. The second SELECT is known as the recursive member. This query gets run every time the CTE is called after the first call. In this second SELECT we join back to the CTE itself based on the managerid and the employeeid. By doing this the CTE will run until the full employee / manager hierarchy is complete.
WITH recursiveCTEExample AS (
-- This is our anchor query
SELECT
anchor.EmployeeId, anchor.EmployeeName, anchor.ManagerId,
1 AS ManagerialLevel
FROM
#RecursiveCTEExample anchor
WHERE
anchor.ManagerId IS NULL
UNION ALL
-- This is our recursive query
SELECT
main.EmployeeId, main.EmployeeName, main.ManagerId,
cte.ManagerialLevel + 1 AS ManagerialLevel
FROM
#RecursiveCTEExample main
INNER JOIN recursiveCTEExample cte ON cte.EmployeeId = main.ManagerId
)
SELECT
cte.EmployeeId, cte.EmployeeName, cte.ManagerId, cte.ManagerialLevel
FROM
recursiveCTEExample AS cte;
We return the employee list with the managerial level of each employee shown.
Views
Where derived tables and CTE’s are tools we can only use within the statement we define them in a view is different. A view is a virtual table whose contents are defined by a SQL SELECT query. A simple view is not persisted within the database; the contents of the view will be produced whenever the view is referenced. A view needs to be defined before it can be used and it’s definition will be stored in the database.
Let’s take the comment subquery as an example and create it into a view.
USE StackOverflow2010;
GO
CREATE VIEW dbo.CommentsView AS
SELECT
UserId, count(*) AS CommentNumber
FROM
dbo.Comments
GROUP BY
UserId;
We create a view with the CRATE VIEW statement. Under this we write our SQL statement as normal. Executing this query will add the view definition to the database. Once we have defined our view we can reference it just like a normal table. Remember the view will be inlined into the full statement so a complicated / nested views can cause issues with execution plans. Let’s bring the original query together using the view.
SELECT
u.id,
u.DisplayName,
COALESCE(CommentNumber, 0) AS CommentNumber
FROM
dbo.users u
LEFT JOIN dbo.CommentsView c ON u.id = c.userid
WHERE
u.location IN ('Africa')
ORDER BY
u.DisplayName;
There are many advantages of using views:
- Can be reused any number of times by SQL queries;
- Hide complex queries;
- Build in business logic. As an example you may always write a query which looks at active users from a user table. You can create a view which defines this. This way if the definition of an active user changes it only has to be changed once in the view rather than in every query that works out active users;
- Provide base tables / columns with more meaningful names. Sometimes naming conventions make no sense to anybody. Views allow us provide better names to tables and columns making them easier for end users to use;
- Support legacy code. If you need to change a table structure which would result in lots of queries breaking you can replace the table name with a view of the old schema. You can then change the table without breaking the legacy code;
- Security advantages such as restricting a user from accessing all columns in a table.
Inline table valued functions (ITVF)
Sometimes you want a reusable SQL query which also has the ability to pass in parameters. This is where the ITVF comes in. Like the view an ITVF is something we define and store within the database and can reuse multiple times with our different queries.
Let’s create an ITVF. We will define one to bring back the latest posts for a user. We will define two parameters for this ITVF. The first parameter will be the userid of the user we are looking for. The second parameter will be the number of posts we want to bring back for that user.
USE StackOverflow2010;
GO
CREATE FUNCTION dbo.GetLatestNCommentsByUser (@userId INT, @n AS int) RETURNS TABLE
AS
RETURN
SELECT
TOP (@n)
UserId,
CreationDate,
Score,
Text
FROM
dbo.Comments
WHERE
UserId = @userId
ORDER BY
CreationDate DESC;
GO
We can select from the ITVF with the following. This will bring back the latest 4 posts for user id = 1.
SELECT *
FROM DBO.GetLatestNCommentsByUser(1, 4);
APPLY Operator
You can use the APPLY operator in a SQL query to join a table to a ITVF so the function is evoked for each row returned from the table. We can use either CROSS APPLY (think inner join) or OUTER APPLY (think left join). Let’s add an OUTER APPLY into our original query retrieving users from Africa to get the latest 4 posts per user.
USE StackOverflow2010;
GO
SELECT
u.id,
u.DisplayName,
c.*
FROM
dbo.users u
OUTER APPLY DBO.GetLatestNCommentsByUser (u.id, 4) AS c
WHERE
u.location IN ('Africa')
ORDER BY
u.DisplayName, c.CreationDate;
As we used the OUTER APPLY we have blank comments for the users samin and Wilmar who have not commented.
Final thoughts
In this article we have touched on the four table expressions available in SQL Server; derived table, CTE, views, and ITVF. For each one we have touched on the syntax and reasons why you may want to implement them into your SQL code.