16th Jul 2024 12 minutes read PostgreSQL CTE: What It Is and How to Use It Jill Thornhill postgresql cte Table of Contents Common Table Expressions in PostgreSQL CTE in PostgreSQL Syntax CTE in PostgreSQL Example Defining CTE Column Names Nested Queries Using PostgreSQL CTE PostgreSQL CTE in Data Manipulation Language Recursive Queries Learn More About PostgreSQL CTEs CTEs, or Common Table Expressions, are a powerful PostgreSQL tool that’s often ignored. This article looks at various PostgreSQL CTEs – including nested and recursive CTEs – and what you can do with them. If you write complex queries in SQL, you’ll soon find that your code becomes cluttered and hard to read. CTEs – also known as WITH clauses – are primarily a way of simplifying queries. However, they also allow you to use recursion. Recursion, among other uses, lets you easily navigate hierarchical structures. PostgreSQL CTEs (or Common Table Expressions) are very similar to subqueries; the difference is that CTEs are named and defined at the top of your code. This lets you break a large query down into small sections. In this article, I’ll work through several examples of PostgreSQL CTEs. I’m assuming that you’re already familiar with writing queries in PostgreSQL. If not, our PostgreSQL Cheat Sheet is well worth downloading. If you think PostgreSQL CTEs will help you in your job, you may like to look at our interactive Common Table Expressions in PostgreSQL course. This course is designed for those who are already familiar with basic SQL. You’ll get lots of practice using PostgreSQL CTEs, thanks to over a hundred interactive exercises. Let’s dig into common table expressions in Postgres! Common Table Expressions in PostgreSQL CTE in PostgreSQL Syntax Let’s now have a closer look at CTE syntax. In its simplest form, it looks like this: WITH cte_name AS (query_1) query_2; cte_name is the name you assign to the CTE. You can refer to this name in your main query or in subqueries, just as you would a table. query_1 is any valid SELECT query_2 is a valid SQL statement. It could be a SELECT, an UPDATE, an INSERT or a DELETE. The results of query_1 will be available as if they were a table. The table name will be the name you specified as cte_name. You can use it in the rest of your query in the same way you use other tables. CTE in PostgreSQL Example Let’s look at an example. Alpha Sales is an online retailer. They want to know if their latest marketing strategy has been effective, and what type of customer has best responded to it. Here is a sample of their order_summary table, which holds the value of each order made in April, May, and June of 2024. order_idcustomer_idorder_datevalue 112024-06-05700 212024-04-18400 312024-05-15500 422024-04-25200 5882024-05-04700 6882024-06-18500 7882024-05-25150 83452024-04-02250 93452024-06-25450 103452024-06-19300 116572024-05-25900 126572024-06-25200 As the first step in analyzing the success of their marketing campaign, the company leaders want to compare the June sales by customer to the average monthly sales by customer for April and May and calculate the percentage change. Of course, you could achieve this using subqueries, but the code would be quite complex. You want to see the previous month average on the report, but also use it in the percentage change calculation. Using a CTE, the query would look like this: WITH april_may_sales AS (SELECT customer_id, SUM(value) / 2 AS prev_avg FROM order_summary WHERE EXTRACT (MONTH FROM order_date) in (4,5) GROUP BY customer_id; ) SELECT order_summary.customer_id, prev_avg, SUM(value) AS jun_total, (SUM(value) - prev_avg) * 100 / prev_avg AS percent_change FROM order_summary JOIN april_may_sales ONapril_may_sales.customer_id = order_summary.customer_id WHERE EXTRACT (MONTH FROM order_date) = 6 GROUP BY order_summary.customer_id, prev_avg ORDER BY customer_id; This query uses the WITH clause to create a virtual table named april_may_sales. It extracts the total sales by customer for the months of April and May, divides the result by 2 to get a monthly average, and stores that information in a column named prev_avg. This table is joined to the order_summary table in the main query so we can look at June’s total alongside the average for April and May. The query produces the following result set: customer_idprev_avgjun_totalpercent_change 1450.00700.0055.56 88425.00500.0017.65 345125.00750.00500.00 657450.00200.00-55.56 Defining CTE Column Names Optionally, you can specifically define the column names for the CTE table using the following syntax: WITH cte_name (column_name_list) AS (query_1) query_2; Here, column_name_list is a list of column names separated by commas. Changing the previous example to use this syntax gives us the following query: WITH april_may_sales (customer_id, prev_avg) AS ( SELECT customer_id, SUM(value) /2 FROM order_summary WHERE EXTRACT (MONTH FROM order_date) in (4,5) GROUP BY customer_id ) SELECT order_summary.customer_id, prev_avg, SUM(value) AS jun_total, (SUM(value) - prev_avg) * 100/prev_avg AS percent_change FROM order_summary JOIN april_may_sales ON april_may_sales.customer_id = order_summary.customer_id WHERE EXTRACT (MONTH FROM order_date) = 6 GROUP BY order_summary.customer_id, prev_avg ORDER BY customer_id; This makes no difference to the query’s output, which remains the same as the previous sample query. It does, however, make it easier for someone else to understand your query. Nested Queries Using PostgreSQL CTE You can define two or more CTEs using one WITH keyword in PostgreSQL. You simply begin by using the WITH keyword and then specify each CTE separated by commas. The syntax looks like this: WITH cte_name_1 AS (query_1), cte_name_2 AS (query_2) query_3; Each CTE has its own name and select statement. Each CTE can refer to any previously-defined CTE to pick up any data it needs. Notice that you don’t repeat the WITH keyword: you just list the CTEs separated by commas. Let’s look at this in action. Suppose Alpha Sales now wants to take this analysis a stage further. They would like to extract demographics for those customers who bought more in June than their average purchases for April and May. To do this, they need to combine the data extracted in the previous query with data from their customer table. Here’s a sample of the data: customer_idprev_avgjun_totalpercent_change 1450.00700.0055.56 88425.00500.0017.65 345125.00750.00500.00 657450.00200.00-55.56 To do this, you can: Move the previous main query to the front as a nested CTE. This effectively creates a virtual table holding the customer_id, the previous average, the June total, and the percentage change. Write a new main query that joins this table with the customer table to calculate the customer’s age and extract his state. The new query looks like this: WITH april_may_sales AS (SELECT customer_id, SUM(value) / 2 AS prev_avg FROM order_summary WHERE EXTRACT (MONTH FROM order_date) in (4,5) GROUP BY customer_id ), comparison AS ( SELECT order_summary.customer_id, prev_avg, SUM(value) AS jun_total, (SUM(value) - prev_avg) * 100/prev_avg AS percent_change FROM order_summary JOIN april_may_sales ON april_may_sales.customer_id = order_summary.customer_id WHERE EXTRACT (MONTH FROM order_date) = 6 GROUP BY order_summary.customer_id, prev_avg ) SELECT customer.customer_id, name, EXTRACT(YEAR from CURRENT_DATE) - EXTRACT(YEAR from date_of_birth) AS age, state, prev_avg, jun_total, percent_change FROM customer JOIN comparison ON comparison.customer_id = customer.customer_id WHERE percent_change > 0; As before, the query defines the CTE named april_may_sales as a virtual table containing the average sales for April and May. It then defines a new CTE named comparison, which contains a comparison of the June totals by customer against the contents of april_may_sales. Finally, the main query combines the data in the comparison virtual table with data from the customer table. The result set looks like this: customer_idnameagestateprev_avgjun_totalpercent_change 1John Smith30KY450.00700.0055.56 88Tinashe Mpofu50ID425.00500.0017.65 345Jennifer Perry26HI125.00750.00500.00 PostgreSQL CTE in Data Manipulation Language Let’s now look at data manipulation statements like INSERT, UPDATE, and DELETE. One of the limitations of CTEs is that you can’t use them directly in place of a value in an UPDATE statement in the same way that you can with a subquery. Let’s say you wanted to update the balance in the customer table by adding in the value of all the June orders. With ordinary subqueries, you can do something like this: UPDATE customer SET balance = balance + (select SUM(value) FROM order_summary WHERE customer.customer_id = order_summary.customer_id AND EXTRACT (MONTH from order_date) = 6); You can’t do that with a CTE. What you can do, however, is use the following syntax: WITH cte_name AS (select_statement) UPDATE tablename SET column_name_1 = column_name_2 FROM cte_name WHERE join_clause; cte_name is the name you’ll use to refer to the ‘table’ created by the CTE. select_statement is the statement you’ll use to populate the CTE. column_name_1 is the column name in the main table that you want updated. column_name_2 is the column name in your CTE that you’ll use to set the new value. join_clause specifies the condition you’ll use to join the two tables. The following query adds the total June orders from the order_summary table to the balance in the customer table: WITH june_total AS (SELECT customer_id, SUM(value) AS jun_tot FROM order_summary WHERE EXTRACT(MONTH FROM order_date) = 6 GROUP BY customer_id ) UPDATE customer SET balance = balance + jun_tot FROM june_total WHERE customer.customer_id = june_total.customer_id; First, the WITH clause creates a pseudo-table named june_total. It contains totals by customer_id of orders where the month of order_date is 6. Next, the column jun_tot from this table is used to increase the balance where customer_id matches between the two tables. The table customer now contains the following data: customer_idnamedate_of_birthstatebalance 1John Smith5/7/1994KY1000 2Shamila Patel14/3/2006CT1000 88Tinashe Mpofu17/4/1974ID500 345Jennifer Perry21/10/1998HI850 657Sarah Jones25/4/1984KY570 You can also use CTEs to insert or delete rows in the same way. Recursive Queries Recursive queries are a feature of CTEs. These queries let you loop from a single base query to repetitively carry out a specified task. They are especially useful for querying hierarchical data such as organizational structures and bills of materials. A full cover of recursive queries is beyond the scope of this article. We’ll just look at the syntax and a simple example. For more details, have a look at What Is a Recursive CTE in SQL, which gives a full explanation and several examples. The syntax of recursive queries in PostgreSQL is: WITH RECURSIVE cte_name AS (query_1 UNION query_2) query_3; The keyword RECURSIVE indicates that this is a recursive query. query_1 is the base query, or starting point. For example, suppose you’re working with an organizational structure. In that case, query_1 could be a query that selects the top-level manager from an employee file. query_2 is the recursive query. This query will be repeated until no more rows meet the criteria specified in the WHERE It can reference the last row added to the result set to pick up data. This could be used to find all employees who report to a manager. UNION combines the results. If you use UNION ALL, duplicates will be retained; otherwise they will be omitted. query_3 is used to return the final result set. It can reference the virtual table created by the CTE. Let’s think about an example of an employee table where employee records have a field identifying the manager they report to. What actually happens if you use a recursive query to navigate this hierarchy? The results of the base query are added to the virtual table. The base query extracts the employee record of the CEO. The database engine then uses this row to find all rows that match the criteria of the recursive part of the query. This will be all employees who report directly to the top-level manager. For each of these records in turn, the engine will find all employees who report to that person. This is repeated until there are no more employees who meet the condition. Let’s look at a simple example. A firm of IT consultants has several ongoing projects, and their policy is to schedule weekly progress meetings for each project. A table named projects holds details of new projects. A sample from this table looks like this: proj_namestart_dateend_datemeet_daymeet_time Online Shopping2024-05-012024-08-29209:00 Customer Migration2024-04-012024-05-16415:00 The firm wants to create details of scheduled meetings in a table named meetings; this information will be used to send out reminders and book a venue each week. The column meet_day holds the day of the week when meetings will be scheduled. It’s stored as a day number within the week, where 0 represents Sunday. You could achieve this with the following recursive query: WITH RECURSIVE date_list (proj_name, meet_date, end_date, meet_day, meet_time) AS ( SELECT proj_name, start_date, end_date, meet_day, meet_time FROM projects UNION ALL SELECT proj_name, meet_date + 1, end_date, meet_day, meet_time FROM date_list WHERE meet_date + 1 <= end_date ) INSERT INTO meetings SELECT proj_name, meet_date, meet_time FROM date_list WHERE meet_day = EXTRACT (DOW from meet_date) ORDER BY proj_name, meet_date; After the query has run, the table meetings holds the following data: proj_namemeet_datemeet_time Customer Migration2024-04-0315:00:00 Customer Migration2024-04-1015:00:00 Customer Migration2024-04-1715:00:00 Customer Migration2024-04-2415:00:00 Customer Migration2024-05-0115:00:00 Customer Migration2024-05-0815:00:00 Customer Migration2024-05-1515:00:00 Online Shopping2024-05-0709:00:00 Online Shopping2024-05-1409:00:00 Online Shopping2024-05-2109:00:00 Online Shopping2024-05-2809:00:00 Online Shopping2024-06-0409:00:00 Online Shopping2024-06-1109:00:00 Online Shopping2024-06-1809:00:00 Online Shopping2024-06-2509:00:00 Online Shopping2024-07-0209:00:00 Online Shopping2024-07-0909:00:00 Online Shopping2024-07-1609:00:00 Online Shopping2024-07-2309:00:00 Online Shopping2024-07-3009:00:00 Online Shopping2024-08-0609:00:00 Online Shopping2024-08-1309:00:00 Online Shopping2024-08-2009:00:00 Online Shopping2024-08-2709:00:00 Let’s break the query up and look at what it actually does. First, it defines the columns that will be included in the CTE date_list: WITH RECURSIVE date_list (proj_name, meet_date, end_date, meet_day, meet_time) Next it establishes the base data for the recursion, which is the contents of the projects table: AS ( SELECT proj_name, start_date, end_date, meet_day, meet_time from projects It then specifies what data must be included in each recursion, with a condition that ensures the recursion terminates when complete: UNION ALL SELECT proj_name, meet_date + 1, end_date, meet_day, meet_time FROM date_list WHERE meet_date + 1 <= end_date Finally, the main query inserts the results held in the virtual table into the table meetings. Does this sound useful? You can learn more about recursive queries and practice some real-world examples if you take our online CTE in PostgreSQL course. Learn More About PostgreSQL CTEs Although CTEs in PostgreSQL may not improve the performance of your queries, they certainly make complex queries easier to write and easier to understand. By splitting a long query into component parts, you can organize your thoughts and keep your coding simple. CTEs also facilitate working with hierarchical structures using the RECURSIVE clause. This article specifically uses PostgreSQL syntax and examples, but CTEs work in much the same way for other SQL dialects like MS SQL Server. If you want to get comfortable using CTEs, LearnSQL’s Common Table Expressions in PostgreSQL course has over 100 hands-on practice exercises that will help you really understand this tool. If you’d like some extra practice, try these free 11 SQL Common Table Expression Exercises. Each exercise gives you the kind of challenge you’d face in the real world, and solutions and explanations are included. And if you’re preparing for an interview, here’s some sample interview CTE questions and answers. I hope this article has given you a good idea of what PostgreSQL CTE can do for you. If you’d like to learn some other advanced PostgreSQL concepts, this article is a good place to start. Now it’s up to you! Remember, practice makes perfect, so check out our Advanced SQL Practice learning track for more hands-on practice of advanced SQL features! Tags: postgresql cte