22nd May 2020 11 minutes read Subquery vs. CTE: A SQL Primer Kamil Bladoszewski sql learn sql subqueries Table of Contents What Is a Subquery? What Is CTE? No Difference at All...? Subquery vs CTE: What’s the Difference? Difference #1: CTEs can be recursive Difference #2: CTEs are reusable Difference #3: CTEs can be more readable But Subqueries Are Sometimes Irreplaceable Filtering with a subquery Subqueries can act like columns Correlated subqueries One More Difference: CTEs Must Be Named Subqueries vs CTEs – Which Is Better? Have you ever wondered about the differences between a subquery and a common table expression (CTE) in SQL? The concepts seem very similar, but knowing the difference – and when to use each one – will help you write efficient and readable queries. First, we’ll explain the concepts underlying subqueries and CTEs. Then we’ll look at a few examples and finally analyze the pros and cons of each technique. What Is a Subquery? A subquery is a query within a query. We can use it in multiple ways: in the FROM clause, for filtering, or even as a column. To use a subquery, simply add parentheses and put the query inside them. In our first examples, we’ll work with data on the results of a long jump competition. We’re given two tables: participant – Stores information about contest participants: id – A unique ID for each participant. first_name – The participant’s first name. last_name – The participant’s last name. jump – Stores information about the jumps made by the participants: id – The ID of each jump. participant_id – The ID of the participant who made the jump. contest_id – The ID of the contest in which the jump was made. length – Jump length, in centimeters. participant idfirst_namelast_name 1AmishaBrown 2JamaalSanford 3HibaCameron jump idparticipant_idcontest_idlength 111667 212745 313723 421736 522669 623508 731664 832502 933739 Since you know the data we’re using, take a look at the following subquery examples: SELECT first_name, last_name, length FROM participant JOIN jump ON jump.participant_id = participant.id WHERE length > ( SELECT AVG(length) FROM jump ); This query shows participants with their jumps that were longer than the average jump length. In the WHERE condition, we’ve used a subquery to get the average jump length. Since there’s only one value returned by this subquery, we can easily compare column data to it. Next, another example: SELECT MAX(avg_length) AS max_length FROM ( SELECT first_name, last_name, AVG(length) AS avg_length FROM participant JOIN jump ON jump.participant_id = participant.id GROUP BY first_name, last_name ) AS average_lengths; In this query, we’re showing the largest average jump length by any participant. To get the result, we first compute the average jump length for each participant. We’ve done this by using a subquery inside the FROM clause. Then, we simply use MAX() to return the largest average length. These are only two examples of subqueries. It’s a broad topic – even though the use cases are rather simple – and there are too many concepts to be described in this article. A short overview of subqueries can be found in the article SQL Subqueries on the LearnSQL.com blog. If you’re interested in working with subqueries, check out part 6 of SQL Basics course (it's even called Subqueries). You can also watch episodes of our We Learn SQL series on Youtube. Several of them have been dedicated to SQL Subqueries. Remember to subscribe to our channel. What Is CTE? A common table expression (called CTE for short) is a query which we create before writing the main query. Then, we can simply use it as a regular table inside our code. Look at the following example. Once again, we’re using the data from the long jump contest: WITH average_lengths AS ( SELECT first_name, last_name, AVG(length) AS avg_length FROM participant JOIN jump ON jump.participant_id = participant.id GROUP BY first_name, last_name ) SELECT MAX(avg_length) AS max_length FROM average_lengths; This returns exactly the same result as the previous example with the subquery: the largest average jump length by any participant. However, instead of writing our query inside the FROM clause, we’ve put it in the WITH clause (which comes before the main query). We’ve called our CTE average_length and used it in the FROM clause in the main query. Of course, CTEs can be far more complicated than this example. But we won’t discuss that here. If you’d like to learn about CTEs, check out the Recursive Queries course at LearnSQL.com. No Difference at All...? At first, you might think that there’s almost no difference between subqueries and CTEs. We’ve used both a subquery and a CTE in the FROM clause and the syntax was only a little different. However, don’t forget the first example – we used a subquery in the WHERE clause there. You couldn’t use a CTE there, and that’s not the only difference! Subquery vs CTE: What’s the Difference? Of course, this doesn’t mean that CTEs are inferior to subqueries. Let’s examine the differences between the two, starting with CTEs. Difference #1: CTEs can be recursive Let’s take a look at the first advantage of CTEs. CTEs allow you to use a powerful concept: recursion. Thanks to recursion, SQL is now Turing complete – every program which can be written in any programming language can also be written in SQL. (If you have doubts that SQL is a programming language, check out Is SQL a programming language? on the LearnSQL.com blog.) How does the recursion work in SQL? It allows your CTE to call itself until a specified ending condition is met. In each step, the query expands itself and changes the data it has. Let’s see an example. We’ll work with the following data for some fictional company employees. It’s stored in the employee table, which contains the following columns: id – A unique ID for each employee. first_name – The employee’s first name. last_name – The employee’s last name. manager_id – The ID of that employee’s manager. employee idfirst_namelast_namemanager_id 1MaisyBloomNULL 2CaineFarrow1 3WaqarJarvis2 4Lacey-MaiRahman2 5MerrynFrench3 Now, we’d like to show the chain of management from the CEO (a person without a value in the manager_id column) to each employee. The query below will solve this problem. Take a look: WITH RECURSIVE employee_chain AS ( SELECT id, first_name, last_name, first_name || ' ' || last_name AS chain FROM employee WHERE manager_id IS NULL UNION ALL SELECT employee.id, employee.first_name, employee.last_name, chain || '->' || employee.first_name || ' ' || employee.last_name FROM employee_chain JOIN employee ON employee.manager_id = employee_chain.id ) SELECT first_name, last_name, chain FROM employee_chain; The result will look like this: first_namelast_namechain MaisyBloomMaisy Bloom CaineFarrowMaisy Bloom->Caine Farrow WaqarJarvisMaisy Bloom->Caine Farrow->Waqar Jarvis Lacey-MaiRahmanMaisy Bloom->Caine Farrow->Lacey-Mai Rahman MerrynFrenchMaisy Bloom->Caine Farrow->Waqar Jarvis->Merryn French We have written a query that can easily create a whole chain of relations. You might think that this could be achieved with subqueries, but as the management chain goes deeper and deeper, you’d have to write more and more code. The amount of code you’d have to write would depend on the depth of the chain – and that can only be checked with a recursive CTE. How does this query work? It starts by running the first part (before the UNION ALL) and selects an employee without a manager (i.e. Maisy Bloom). Then the part beneath the UNION ALL selects the employee(s) directly managed by Maisy (Caine Farrow). Since the query is calling itself, it then runs the same part again and selects all the employees managed by Caine (Waqar Jarvis and Lacey-Mai Rahman). It repeats this operation as long as it has rows to join. After traversing the whole management chain, the query halts. If this is your first encounter with recursion in SQL, it might be a little hard to understand. And that’s totally normal. Check out Do it in SQL: Recursive SQL Tree Traversal for a more detailed explanation. Difference #2: CTEs are reusable A huge advantage of CTEs is that they can be used multiple times in a query. You don’t have to copy the whole CTE code – you simply put the CTE name. Using the data from the previous section, we’d like to 1) filter out the employees who don’t have a manager and then 2) show each employee with their manager – but only if they have a manager. The result will look like this: first_namelast_namefirst_namelast_name WaqarJarvisCaineFarrow Lacey-MaiRahmanCaineFarrow MerrynFrenchWaqarJarvis Now, let’s see how a CTE would solve this task: WITH not_null_manager AS ( SELECT * FROM employee WHERE manager_id IS NOT NULL ) SELECT nnm1.first_name, nnm1.last_name, nnm2.first_name, nnm2.last_name FROM not_null_manager AS nnm1 JOIN not_null_manager AS nnm2 ON nnm1.manager_id = nnm2.id; Now let’s see how a subquery would achieve the same result: SELECT nnm1.first_name, nnm1.last_name, nnm2.first_name, nnm2.last_name FROM ( SELECT * FROM employee WHERE manager_id IS NOT NULL ) AS nnm1 JOIN ( SELECT * FROM employee WHERE manager_id IS NOT NULL ) AS nnm2 ON nnm1.manager_id = nnm2.id; As you can see, the CTE query has less code. It’s more readable, too: you simply repeat the CTE name (not_null_manager) instead of a whole chunk of code. There’s not really much difference in the performance efficiency between these two queries. Even though you only declare the CTE once, the execution time is almost the same. Difference #3: CTEs can be more readable So, you know that you can write less code using CTEs. What about code organization? Here’s another example that focuses on the FROM clause. Do you still remember the first examples? The ones where returning the largest average jump length? If not, here’s a quick review. This one uses a subquery: SELECT MAX(avg_length) AS max_length FROM ( SELECT first_name, last_name, AVG(length) AS avg_length FROM participant JOIN jump ON jump.participant_id = participant.id GROUP BY first_name, last_name ) AS average_lengths; And this one uses a CTE: WITH average_lengths AS ( SELECT first_name, last_name, AVG(length) AS avg_length FROM participant JOIN jump ON jump.participant_id = participant.id GROUP BY first_name, last_name ) SELECT MAX(avg_length) AS max_length FROM average_lengths; Which one do you think is more readable? I’d say the second one (CTE) is a bit better. The code is less nested and there’s a small layer of abstraction. You know at first glance that this query selects the maximal average length. CTEs’ readability is shown even more in the examples from the previous section. And when your queries grow bigger and bigger, using CTEs will significantly increase the readability of your code. And readability is key to the efficient development of code. If you want to learn more about using CTEs to improve the readability of your code, you should definitely check out the article How to Organize SQL Queries with CTEs. But Subqueries Are Sometimes Irreplaceable So far, you’ve learnt quite a few differences between subqueries and CTEs. To be honest, I was trying to convince you that CTEs are much better than subqueries. But in this section, you’ll learn why subqueries can be invaluable. Filtering with a subquery This article’s first example used a subquery in the WHERE clause. I didn’t show a similar example in the CTE section. That’s because only subqueries can be used in the WHERE clause! What’s more, there are quite a few keywords you can use in the WHERE condition – e.g. ALL, ANY, EXISTS, and a few more! Unfortunately, I can’t explain them here; it’d take too much time. Instead, I recommend checking out the Subqueries section of our SQL Basics course. You’ll not only learn about these keywords, you’ll solve a few problems using them! Or check out the article SQL Subqueries in our blog if you want a brief explanation. Subqueries can act like columns You can also use subqueries as you would a column. The only constraint is that the subquery must return only one value. Take a look: SELECT DISTINCT contest_id, ( SELECT COUNT(length) FROM jump AS inside_jump WHERE inside_jump.contest_id = outside_jump.contest_id AND inside_jump.length > 600 ) AS longer_jumps, ( SELECT COUNT(length) FROM jump AS inside_jump WHERE inside_jump.contest_id = outside_jump.contest_id AND inside_jump.length <= 600 ) AS shorter_jumps FROM jump AS outside_jump; For each contest, this query returns the number of jumps longer than 600cm (computed in the first subquery) and the number of jumps shorter than or equal to 600cm (computed in the second subquery). The result will look like this: contest_idlonger_jumpsshorter_jumps 130 221 321 Correlated subqueries Look at the above example again. Did you notice that I used a reference to a table in the outer query inside the subquery? I even referred to the value of the current row from that table. This is called a “correlated subquery”. It allows you to use values from the outer query inside the subquery. This is a very handy technique, but it’s also quite complicated; we won’t explain it in this article. However, feel free to check out Correlated Subquery in SQL: A Beginner’s Guide in our blog for an explanation. One More Difference: CTEs Must Be Named The last difference between CTEs and subqueries is in the naming. CTEs must always have a name. On the other hand, in most database engines, subqueries don’t require any name (the only exception is the FROM clause in my favorite database engine, PostgreSQL). It’s a good practice to name subqueries placed in the FROM or SELECT clauses, but it’s not a requirement. And, to be precise, you can’t name the subqueries you use in the WHERE clause. You might think that naming is not a big difference and won’t affect you much. However, you may have to quickly check something in the database. In this case, the easier syntax might be your choice. Even if the query is less readable, keep this in mind – such queries are rarely read after they’re used. Subqueries vs CTEs – Which Is Better? You’ve learnt a lot about the differences between CTEs and subqueries. So, then, which is better? The answer is neither, either, or it depends – both subqueries and CTEs have pros and cons. Each query should be analyzed and the choice between these two decided on a case-by-case basis. But to do this, you’ll have to thoroughly learn both concepts. To learn more about subqueries, I can suggest the Subqueries part of LearnSQL’s SQL Basics course. If you’d like to learn more about CTEs, the Recursive Queries course is your best bet. These courses will help you quickly learn these concepts. Thus, you’ll be able to decide which queries benefit from CTEs and which call for subqueries. However, if you’re already somewhat familiar with correlated subqueries and don’t want to learn the same thing once again, you might also want to sharpen your skills in our SQL Practice Set course. Tags: sql learn sql subqueries