14th May 2024 18 minutes read 11 SQL Common Table Expression Exercises Gustavo du Mortier CTE online practice Table of Contents The Value of CTEs Simple CTE Exercises Exercise 1: Using a CTE to Get Totalized Data Exercise 2: Using Multiple CTEs in the Same Statement Nested CTEs Exercise 3: Using Nested CTEs to Move Incrementally Toward a Result Exercise 4: Combining Nested Queries and Subqueries Exercise 5: Using Nested CTEs to Compute Complex Statistics Exercise 6: Using Nested CTEs to Compare Sets of Elements Recursive CTEs Exercise 7: Using Recursive CTEs to Generate Sequences Exercise 8: Using a Recursive CTE to Traverse a Hierarchical Data Structure (Tree) Exercise 9: Show the Path of a Tree Starting from the Root Node Exercise 10: Using Multiple Recursive Columns Exercise 11: Using Recursive CTEs to Traverse Network-type Data Structures Need More Common Table Expression Exercises? In this article, we offer you 11 practice exercises that put your knowledge of common table expressions (CTEs) into practice. Each CTE exercise is accompanied by a solution and a detailed explanation. As the saying goes, “A little progress each day adds up to big results”. And that is undoubtedly true for SQL proficiency. Just as you should regularly go to the gym to keep your muscles toned, you should do common table expressions exercises often to keep your querying skills toned for heavy data analysis work. CTEs, or common table expressions, are powerful statements in SQL. They allow you to temporarily define a subquery within a SQL query and assign a name to it. This means the CTE can then be referred to by its name within the query – just as if it were a table. The Value of CTEs In addition to names, there are other important differences between CTEs and subqueries. CTEs are handy for structuring complex queries, making them easier to read, understand, and debug. CTEs can also be used repeatedly within the same query, making it more concise. These qualities of CTEs make them ideal for data analysis work, as it is common for data analysis queries to mix simple data SELECTs with grouped SELECTs with totals, averages, and other aggregate functions. Without CTEs, complex queries could become virtually impossible to read or require the creation of temporary tables or views that unnecessarily populate the database with lots of objects. CTEs are also very useful when performing highly complex data analysis without resorting to other programming languages. Using recursion to find critical paths or to traverse abstract data types (such as trees and graphs) are examples that demonstrate CTEs’ usefulness. So is the possibility of creating nested queries, which gradually reduce the complexity of a problem until it becomes a simple SELECT. CTEs’ ability to make queries more readable and concise is something you will appreciate if you need to review a long query years after writing it. Fortunately, most modern relational database management systems (RDBMSs) – including PostgreSQL, MySQL, SQL Server, and Oracle – allow the use of CTEs. All the exercises compiled in this article were taken from our Recursive Queries course. It’s an in-depth tutorial on common table expressions in data analysis. The 114 interactive exercises cover simple CTEs, nested CTEs, and recursive CTEs in a total estimated time of 18 hours. You can also learn more about CTEs by reading CTEs Explained with Examples. Another important topic that every data analyst should master is SQL window functions. You can check out this set of SQL window function practice exercises to put your skills to the test. Now, let’s get started with our common table expression exercises. We’ll start with simple CTEs and then move on to nested and recursive CTEs. Simple CTE Exercises For these CTE exercises, we will use a database designed to manage crowdfunding projects. This schema is composed of three tables: supporter contains information about supporters, who are those who donate money for the projects. project contains information about the projects that receive donations from supporters. donation records the donations from supporters to the projects. The supporter table stores the id, first_name, and last_name of each supporter in the system. Let’s see some of its rows: idfirst_namelast_name 1MarleneWagner 2LonnieGoodwin 3SophiePeters 4EdwinPaul 5HughThornton The project table stores id, category, author_id, and the minimal_amount needed to start each project. These are some of its rows: idcategoryauthor_idminimal_amount 1music11677 2music521573 3traveling24952 4traveling53135 5traveling28555 The data in the author_id column links each project in the project table with a row of the supporter table. Each supporter related to a project by the column author_id is the author of that project. Finally, the donation table contains id, supporter_id, the amount of the donation, and the column donated, showing the date each donation was made. idproject_idsupporter_idamountdonated 144928.402016-09-07 2818384.382016-12-16 3612367.212016-01-21 4219108.622016-12-29 51020842.582016-11-30 Exercise 1: Using a CTE to Get Totalized Data Exercise: Obtain the project ID, minimal amount, and total donations for projects that have received donations over the minimum amount. Solution: WITH project_revenue AS ( SELECT project_id, SUM(amount) AS sum_amount FROM donation GROUP BY project_id ) SELECT project.id, minimal_amount, sum_amount FROM project_revenue INNER JOIN project ON project.id = project_revenue.project_id WHERE sum_amount >= minimal_amount; Explanation: To solve this exercise, we use a CTE called project_revenue that totals the donations of each project. This CTE has two columns: id and sum_amount, the latter being the calculated sum of donations for each project_id. After the CTE definition, we use a SELECT statement that joins the project table with the CTE. For each project that has received donations, the CTE returns its id, minimal_amount, and the total donations (sum_amount) it received. The CTE project_revenue only includes rows for projects that have received donations because it gets data from the donation table. The SELECT below the CTE definition also shows only projects that received donations because of the INNER JOIN between the CTE and the project table. And the WHERE condition ensures that we will get only projects for which the amount donated exceeds the minimum amount. If you need practice in grouping data in SQL, check out this set of 10 GROUP BY exercises. Try these advanced SQL practice exercises to speed your way to SQL proficiency. Exercise 2: Using Multiple CTEs in the Same Statement Exercise: Select supporters who donated more than $200 total or who donated at least twice. Solution: WITH rich AS ( SELECT s.id, first_name, last_name FROM supporter s JOIN donation d ON d.supporter_id = s.id GROUP BY s.id, first_name, last_name HAVING SUM(amount) > 200 ), frequent AS ( SELECT s.id, first_name, last_name FROM supporter s JOIN donation d ON d.supporter_id = s.id GROUP BY s.id, first_name, last_name HAVING COUNT(d.id) > 1 ) SELECT id, first_name, last_name FROM rich UNION ALL SELECT id, first_name, last_name FROM frequent; Explanation: This exercise asks us to combine two different results that we must obtain by retrieving information from the donation and supporter tables: supporters whose total donations exceed $200 and supporters who made more than one donation. This situation is ideal to solve by writing two CTEs, one to obtain the first set of data (rich) and the other to obtain the second set (frequent). SQL syntax supports writing multiple CTEs within the same command, which we took advantage of to solve this exercise. By placing each subquery in a different CTE, the final SELECT is simply the union of two simple SELECTs – each of which fetches data directly from one CTE. Nested CTEs Although no RDBMS allows the creation of a CTE within another CTE, what they do allow is nested CTEs; this is when a CTE refers to a previously-defined CTE as if it were a table. In this way, CTEs create different levels of abstraction. This makes the final query a simple and concise SELECT. For our nested CTE exercises, we will use a table schema of a door-to-door sales company. This schema has three tables: salesman, daily_sales, and city. The salesman table includes the id, first_name, last_name, and city_id for each salesman. These are some of its rows: idfirst_namelast_namecity_id 1FrederickWebster1 2CaseySantiago2 3CindyFields3 4TimothyPratt4 5SusanRose5 The daily_sales table represents the totalized sales per day and salesman. It has the columns day, salesman_id, items_sold, amount_earned, distance, and customers. The latter two columns show the distance traveled and the number of customers served by each salesman each day. These are a few of its rows: daysalesman_iditems_soldamount_earneddistancecustomers 2017-01-15101673.203020 2017-01-152162288.4913613 2017-01-153171232.7812914 2017-01-15421496.882512 2017-01-155221384.1334018 Finally, we have the city table that stores the id, name, country, and region of each city: idnamecountryregion 1ChicagoUSAAmericas 2New YorkUSAAmericas 3Mexico CityMexicoAmericas 4Rio de JaneiroBrasilAmericas 5ParisFranceEurope Exercise 3: Using Nested CTEs to Move Incrementally Toward a Result Exercise: Get the date, city ID, city name, and total amount of all daily sales – grouped by date and city – that exceed the average daily sales for all cities and all days. Solution: WITH earnings_per_day_city AS ( SELECT ds.day, c.id, c.name, SUM(amount_earned) AS total_earnings FROM salesman s JOIN daily_sales ds ON s.id = ds.salesman_id JOIN city c ON s.city_id = c.id GROUP BY ds.day, c.id, c.name ), overall_day_city_avg AS ( SELECT AVG(total_earnings) AS avg_earnings FROM earnings_per_day_city ) SELECT day, id, name, total_earnings FROM earnings_per_day_city, overall_day_city_avg WHERE total_earnings > avg_earnings; Explanation: Nested CTEs let us break a problem into parts and gradually approach the solution. In this exercise, we first need to total the sales per day and per city. We do that with the first CTE, earnings_per_day_city. Then we need to get an average of all totalized sales per day and per city. We achieve this with the overall_day_city_avg CTE, which in turn uses the results previously totaled by the earnings_per_day_city CTE. This second CTE will return a single row with the average sales for all days and all cities. In the final SELECT, we simply take the data from the two CTEs (no need to combine them with a JOIN, since earnings_per_day_city returns a single row) and add the WHERE condition that the total sales of the day and the city must be greater than the overall average. Exercise 4: Combining Nested Queries and Subqueries Exercise: Obtain the date on which the average number of customers served per region was the lowest of all, displaying this average along with the date. Solution: WITH sum_region AS ( SELECT day, region, SUM(customers) AS sum_customers FROM salesman s JOIN daily_sales ds ON s.id = ds.salesman_id JOIN city c ON s.city_id = c.id GROUP BY day, region ), avg_region AS ( SELECT day, AVG(sum_customers) AS avg_region_customers FROM sum_region GROUP BY day ) SELECT day, avg_region_customers FROM avg_region WHERE avg_region_customers = (SELECT MIN(avg_region_customers) FROM avg_region); Explanation: To solve this query we use the same successive approach to the solution as in the previous exercise, creating first a CTE to obtain the total number of customers served per day and per region and then another CTE based on the previous one to obtain the daily averages of customers served per day. Then, in the final SELECT, we use a subquery to get the minimum of the average number of customers per day and use it in the WHERE clause as the comparison value, so that the query returns the day that corresponds to that minimum. If you wanted to further break up the final SELECT, you could add a third CTE instead of a subquery. This way, the final SELECT gets even simpler. Here’s the new (third) CTE and the outer SELECT: min_avg_region as ( SELECT MIN(avg_region_customers) as min_avg_region_customers FROM avg_region ) SELECT day, avg_region_customers FROM avg_region, min_avg_region WHERE avg_region_customers = min_avg_region_customers; Exercise 5: Using Nested CTEs to Compute Complex Statistics Exercise: For each city, calculate the average total distance traveled by each salesperson. Also calculate an overall average of all city averages. Solution: WITH distance_salesman_city AS ( SELECT city_id, salesman_id, SUM(distance) AS sum_distance FROM daily_sales d JOIN salesman s ON d.salesman_id = s.id GROUP BY city_id, salesman_id ), city_average AS ( SELECT city_id, AVG(sum_distance) AS city_avg FROM distance_salesman_city GROUP BY city_id ) SELECT AVG(city_avg) FROM city_average; Explanation: The benefits of nested CTEs are noticeable when you need to perform statistical calculations composed of several successive steps. In this case, the final result is a total average of the averages per city of the sums of distances for each city and salesperson. That is an average of averages of sums. CTEs allow us to make a stepwise approximation to the result, analogous to how a data scientist would do it with statistical formulae. Exercise 6: Using Nested CTEs to Compare Sets of Elements Exercise: Compare the average sales of all salespersons in the USA with the average sales of all salespersons in the rest of the world. Solution: WITH cities_categorized AS ( SELECT id AS city_id, CASE WHEN country = 'USA' THEN country ELSE 'Rest of the World' END AS category FROM city ), sales_category AS ( SELECT category, salesman_id, SUM(items_sold) total_sales FROM daily_sales ds JOIN salesman s ON s.id = ds.salesman_id JOIN cities_categorized ON cities_categorized.city_id = s.city_id GROUP BY category, salesman_id ) SELECT category, AVG(total_sales) FROM sales_category GROUP BY category; Explanation: In the first CTE (cities_categorized), we separated cities into two groups: cities in the USA and cities in the rest of the world. In the second CTE, we combine information from the cities_categorized CTE with daily_sales and salesman to obtain sales totals grouped by the two categories of cities and by salesman. In the final SELECT, we simply group by city category and get the average sales for each of the two categories (US cities and Rest of the World cities). Recursive CTEs In SQL programming, recursive CTEs are common table expressions that reference themselves. Like recursive functions used in other programming languages, recursive CTEs are based on the principle of taking the data resulting from a previous run, adding to it or modifying it, and passing the results to the next run. We keep doing the same until a stop condition is met, which is when the final result is obtained. Recursive CTEs must have the word RECURSIVE after the word WITH. The best way to understand the operation of recursive CTEs is by using a simple example, as in the following exercise. Exercise 7: Using Recursive CTEs to Generate Sequences Exercise: Use recursion to list all integers from 1 to 10. Solution: WITH RECURSIVE ten_numbers(prev_number) AS ( SELECT 1 UNION ALL SELECT ten_numbers.prev_number + 1 FROM ten_numbers WHERE prev_number < 10 ) SELECT * FROM ten_numbers; Explanation: This query adopts the PostgreSQL recursive CTEs notation, which has four parts: Anchor member: This is where we define the recursion’s starting point. This part of the query must be able to be solved autonomously, without the need to use results from previous iterations of the same CTE. Recursive member: This part is repeated as many times as necessary, using the results of the previous iteration as a basis . Termination condition: This condition is evaluated after each repetition of the recursive member; when it’s met, the recursive loop will end. If this condition were not present or always yielded a true result, the recursion would continue indefinitely. Invocation: The main difference between this SELECT query and other CTE main queries is that this SELECT acts as a trigger for the cycle of recursive executions. In this exercise, the anchor member simply returns a row with the number 1. The recursive member takes the row(s) from the previous execution and appends (via the UNION clause) a new row with the previous value incremented by 1. The termination condition states that the query will continue iterating until the value obtained equals 10. Exercise 8: Using a Recursive CTE to Traverse a Hierarchical Data Structure (Tree) For this exercise, we will use the employee table, which has the columns id, first_name, last_name, and superior_id. Its rows contain the following data: idfirst_namelast_namesuperior_id 1MadelineRaynull 2VioletGreen1 3AltonVasquez1 4GeoffreyDelgado1 5AllenGarcia2 6MarianDaniels2 Exercise: Show all data for each employee, plus a text showing the path in the organization’s hierarchy that separates each employee from the top boss (identified by the literal value 'Boss)'. Solution: WITH RECURSIVE hierarchy AS ( SELECT id, first_name, last_name, superior_id, 'Boss' AS path FROM employee WHERE superior_id IS NULL UNION ALL SELECT employee.id, employee.first_name, employee.last_name, employee.superior_id, hierarchy.path || '->' || employee.last_name FROM employee JOIN hierarchy ON employee.superior_id = hierarchy.id ) SELECT * FROM hierarchy; Explanation: The data in the employee table represents a hierarchical or tree structure, where each row has a column that relates it to its superior (another row in the same table). The row that corresponds to the company boss (the root node of the tree) is the one that has a null value in the superior_id column. Therefore, that is our anchor member to build this recursive CTE. The path of this anchor member simply carries the literal value 'Boss'. Then, the recursive query member joins the previous iteration of hierarchy with employee, setting the condition that the bosses of the employees of the current iteration (superior_id) are already in the hierarchy. This means that, for each iteration, we add another layer to the hierarchy. This layer is formed by the subordinates of the employees that were added in the previous iteration. Hence the join condition is employee.superior_id = hierarchy.id. The path of each employee is assembled by concatenating the path of their boss (hierarchy.path, which shows all the way to 'Boss') with the last name of the employee of the current iteration, joined by a string representing an arrow (hierarchy.path || '->' || employee.last_name). Exercise 9: Show the Path of a Tree Starting from the Root Node Exercise: Display a list containing the first and last name of each employee (including the top boss), together with a text (the path field) showing the path of the tree between each employee and the top boss. In the case of the top boss, the path column should show the last_name of the boss. Solution: WITH RECURSIVE hierarchy AS ( SELECT first_name, last_name, CAST(last_name AS text) AS path FROM employee WHERE superior_id IS NULL UNION ALL SELECT employee.first_name, employee.last_name, hierarchy.path || '->' || employee.last_name AS path FROM employee, hierarchy WHERE employee.superior_id = hierarchy.id ) SELECT * FROM hierarchy; Explanation: The solution of this exercise is very similar to the previous exercise, with the only exception that the path value for the root node is not a literal TEXT-type value; it’s a last_name value in the employee table. This forces us to perform a data conversion to avoid getting an error when we run this query. Since the CTE makes a UNION between the data returned by the anchor component and the data returned by the recursive component, it is imperative that both result sets have the same number of columns and that the data types of the columns match. The last_name column of the employee table (referred as path in the CTE anchor member) is a VARCHAR type, while the concatenation hierarchy.path || '->' || employee.last_name (referred as path in the recursive member) automatically yields a TEXT column. For UNION not to cause a type mismatch error, it is necessary to CAST(last_name AS text) in the anchor member. In this way, the path columns of both parts of the CTE will be TEXT. Exercise 10: Using Multiple Recursive Columns Exercise: List all the data for each employee plus the path in the hierarchy until the top boss is reached, Include a column called distance that shows the number of people in the hierarchy from the top boss to the employee. For the top boss, the distance is 0; for their subordinates, it is 1; for the subordinates of their subordinates, it is 2, and so on. Solution: WITH RECURSIVE hierarchy AS ( SELECT id, first_name, last_name, superior_id, 'Boss' AS path, 0 AS distance FROM employee WHERE superior_id IS NULL UNION ALL SELECT employee.id, employee.first_name, employee.last_name, employee.superior_id, hierarchy.path || '->' || employee.last_name, hierarchy.distance + 1 FROM employee, hierarchy WHERE employee.superior_id = hierarchy.id ) SELECT * FROM hierarchy; Explanation: As there are two recursive columns in the CTE, it is necessary to indicate an initial value for each of them in the anchor member. In this case, the path column has the initial value 'Boss' (as in exercise 8) and the distance column has the value 0. Then, in the recursive member, the distance value results from adding 1 to the distance of the previous iteration. Exercise 11: Using Recursive CTEs to Traverse Network-type Data Structures For this exercise, we use two tables: a destination table (consisting of the columns id and name) and a ticket table (consisting of the city_from, city_to, and cost columns). The destination table contains the IDs and names of a group of cities, while the ticket table indicates the cost of tickets between the pairs of cities in the destination table (where such connections exist). This is sample data from both tables (first destination, then ticket): idname 1Warsaw 2Berlin 3Bucharest 4Prague city_fromcity_tocost 12350 1380 14220 23410 24230 32160 34110 42140 4375 Exercise: Find the cheapest route to travel between all cities in the destination table, starting from Warsaw. The query must show the following columns: path – The names of the cities on the path, separated by '->'. last_id – The id of the final city in this trip. total_cost – The sum of the costs of the tickets. count_places – The number of cities visited. This must be equal to the total number of cities in destination, i.e. 4. Solution: WITH RECURSIVE travel(path, last_id, total_cost, count_places) AS ( SELECT CAST(name as text), Id, 0, 1 FROM destination WHERE name = 'Warsaw' UNION ALL SELECT travel.path || '->' || c2.name, c2.id, travel.total_cost + t.cost, travel.count_places + 1 FROM travel JOIN ticket t ON travel.last_id = t.city_from JOIN destination c1 ON c1.id = t.city_from JOIN destination c2 ON c2.id = t.city_to WHERE position(c2.name IN travel.path) = 0 ) SELECT * FROM travel WHERE count_places = 4 ORDER BY total_cost ASC; Explanation: The method for solving this exercise is similar to the previous exercise. In this case, however, there is no direct order to the relationship between the elements in the same table. Instead, the relationships between the elements of the destination table are expressed in the ticket table, linking each pair of connected cities. The first row of the results of the above query shows the lowest-cost route. This is possible because the outer SELECT of the query sorts the results in ascending order by total_cost. In turn, the outer SELECT makes sure that the number of cities traveled is 4 by setting the condition that count_places equals 4. The recursive member ensures that each iteration adds a new city to the route by setting the condition that the name of the city is not already in the path (position(c2.name IN travel.path) = 0). Since the starting point of the trip is the city of Warsaw, the anchor member of the recursive CTE is the row of the destination table where the name is equal to 'Warsaw'. Note that we convert the name column to the TEXT data type (as in Exercise 9) so that the data type matches the corresponding column in the recursive CTE member. Need More Common Table Expression Exercises? If you've been working through the SQL exercises in this article, you now know how to use common table expressions . And you know how useful CTEs can be. These exercises come from our Recursive Queries course, and there are even more exercises like these in the full course. To get really good at data analysis with SQL, think about joining LearnSQL.com. We recommend checking out the All Forever SQL Package. It gives you lifetime access to all the SQL courses we offer now, plus any new ones we add later. This way, you can keep learning new skills forever. Join LearnSQL.com today and start building a great future in data analysis! Tags: CTE online practice