28th Oct 2020 16 minutes read What are SQL CTE Best Practices? Tihomir Babic sql learn sql CTE Recursive Queries Table of Contents What Are CTEs? CTEs Make Your Code More Organized and Readable Avoid Repeating Subqueries by Using CTEs Replace a Subquery in the FROM Clause Best Practices for Naming CTEs Give Meaningful Names to Your CTEs Be Consistent When Naming CTEs Apply the Same Naming Convention to Both Tables and CTEs You Can’t Avoid CTEs When Using Recursive Queries The Benefits of Using CTEs Are you wondering what the best practices for using common table expressions are? This article will help you learn when to use a CTE and how to write it. If you have heard about SQL CTEs, you have probably noticed they’re often mentioned together with subqueries. Sometimes, people think there’s no difference compared to the subqueries, and there’s often debate about whether a CTE or subquery should be used to get a particular result. The truth is, most of the queries written with CTEs can also be written with subqueries. The keyword here is most! There are, of course, some situations in which only CTEs can help you. There are also situations in which you can choose between a CTE and subquery, but using a CTE is often more comfortable. I’ll show you what those situations are and the CTE best practices you should apply when writing a CTE query. What Are CTEs? CTEs, or common table expressions, are a temporary named data set returned by a query. The syntax of this SQL feature is: WITH expression_name AS (CTE definition) CTEs are also called WITH queries, and if you take a look at the syntax above, it’s obvious why. If you need to refresh your memory, you can read this article, which nicely shows how this basic syntax works in practice. Speaking of practice, why not warm your fingers by writing some CTE code in the Recursive Queries course or the Advanced SQL track. It’s now time I lead you through some benefits of using CTEs. CTEs Make Your Code More Organized and Readable Here’s the scenario: you’re working for a company that employs freelancers for various jobs. The data is stored in three tables: freelancer hours_worked job_position The table freelancer contains the following data: id: the ID of the freelancer. first_name: the first name of the freelancer. last_name: the last name of the freelancer. country: the country the freelancer works from. pay_rate: the pay rate per hour. job_type_id: the type of the job (the FK from the table job_position). The second table, hours_worked, has columns: id: the ID of the hours worked. date: the date of the hours worked. hours: the number of hours worked. freelancer_id: the ID of the freelancer (the FK from the table freelancer). The last table is job_position: id: the ID of the job position type. position_type: the name of the job position type. In this first example, I will show you how using CTEs can make your code more organized and readable. Using the above tables, we need to list all freelancers, their pay rate, the job type they are hired for, and the average pay rate for the particular job position type. If you want to use a subquery, here’s what the code will look like: SELECT f.id, f.first_name, f.last_name, f.pay_rate, j.position_type, sq.average_job_rate FROM freelancer f JOIN job_position j ON f.job_type_id = j.id JOIN (SELECT j.id, AVG (f.pay_rate) AS average_job_rate FROM freelancer f JOIN job_position j ON f.job_type_id = j.id GROUP BY j.id) AS sq ON j.id = sq.id; The code takes the following columns from the table freelancer: id, first_name, last_name, and pay_rate. Then, it takes the column position_type from the table job_position. Those two tables are given their aliases, and they are joined on the columns f.job_type_id and j.id. That’s fine, but I’m still missing the average pay rate per job. I’ve joined the table, which is not a table, but a subquery in this case. In this subquery, I’ve again joined the tables freelancer and job_position on the columns f.job_type_id and j.id. I’ve done that so I can select the column id from the table job_position. After that, it calculates the average pay rate using the AVG() function, with the result appearing in the new column average_job_rate. The result is then grouped by the column id since I want the result to be per job type. Finally, the subquery is named sq. After I’ve written the subquery, I’ve joined it with the other tables on the columns j.id and sq.id. Now that I have the subquery written, I’ve put the column sq.average_job_rate in the main SELECT statement. When you run the code, you’ll get the following result: idfirst_namelast_namepay_rateposition_typeaverage_job_rate 1CaroleQueen60SQL Developer58.9 2ArthurStrauss42SQL Developer58.9 3CindyJones75SQL Developer58.9 4PaulHopkins25.5Content Writer24.618181 5KatherineMcLean32Content Writer24.618181 6DonJones18.5Content Writer24.618181 7LindaStones40.5Data Analyst39.45 8TrentArthurs12.5Data Analyst39.45 9MiriamJones52Data Analyst39.45 10GuyGudbois36SQL Developer58.9 11MichelleLebouef40SQL Developer58.9 12Jean-PaulLamontagne26Content Writer24.618181 13CamilleBarbet16Content Writer24.618181 14JulesFournier24.8Content Writer24.618181 15PauletteDubois32Content Writer24.618181 16MohamedLambert38Content Writer24.618181 17SimonePomeroy36.7Data Analyst39.45 18GaspardChevalier51Data Analyst39.45 19AnoukDe Jong80SQL Developer58.9 20AmbroosJansen75SQL Developer58.9 21HendrikaDe Vries41SQL Developer58.9 22CaetanoVan de Berg84.5SQL Developer58.9 23GrietjeVan Dijk55.5SQL Developer58.9 24DaelJanssen18Content Writer24.618181 25UdayVisser28Content Writer24.618181 26HadwinBakker12Content Writer24.618181 27LivMeijer44Data Analyst39.45 Everything’s okay with the result, but I’m not satisfied with the code. Doesn’t it seem a little unreadable? I have difficulty analyzing it even though I just wrote it. Imagine how difficult it would be if I returned to it in a few days and wanted to change something. Or even worse, what if I had to analyze code written by someone else in such a way? I don’t know about you, but I don’t like having the SELECT statements in JOIN. It makes my brain work in the opposite way of what is logical for me. I also want to have the simple name of the tables when I’m joining them, not disturbing this flow with a chunk of another SELECT statement, i.e., subquery. Is it possible to make this code easier to read? Of course, that’s the first reason I’d use a CTE instead. Have a look at some code written with CTE: WITH average_rate_per_job AS ( SELECT j.id, AVG (f.pay_rate) AS average_job_rate FROM freelancer f JOIN job_position j ON f.job_type_id = j.id GROUP BY j.id ) SELECT f.id, f.first_name, f.last_name, f.pay_rate, j.position_type, a.average_job_rate FROM freelancer f JOIN job_position j ON f.job_type_id = j.id JOIN average_rate_per_job a ON f.job_type_id = a.id; Doesn’t that look a little nicer and easier to analyze? Let’s do exactly that! I’ve created a CTE using the WITH statement. The name I’ve given to the CTE is average_rate_per_job. The name is followed by the keyword AS, and the CTE is defined in the parentheses. If you care to check, it’s entirely the same as the subquery in the previous code. However, for me, it organizes the code better. Logically, I would first calculate the average rate per job and try to “merge” this result with the table employee. After I’ve calculated the average rate per job, it’s time to list all the employees and job position types they’re hired for. Once again, the SELECT statement is the same as in the first code, save for different aliases and, fortunately, no subquery following the JOIN clause. As you can see, the code takes the columns id, first_name, last_name, and pay_rate from the table freelancer, and the column position_type from the table job_position. Finally, it selects the column average_job_rate from the CTE. All those tables, including the CTE, are joined together. I’ve simply treated the CTE as a table when joining it, using its name average_rate_per_job as I would when joining any other table. When you run the code, you will get the same result as running the first messy code. Avoid Repeating Subqueries by Using CTEs Let’s try something else. Use the data to get the freelancers and their average daily hours worked. Also, calculate the total average daily hours worked. Then, find the difference between those two values. Select only those freelancers who worked on average at least one hour above the total daily average. How would you do this using subqueries? Here’s how: SELECT f.id, f.first_name, f.last_name, AVG (h.hours) AS daily_avg_freelancer, (SELECT AVG (hours) FROM hours_worked) AS daily_avg_total, AVG (h.hours) - (SELECT AVG (hours) FROM hours_worked) AS diff FROM hours_worked h JOIN freelancer f ON h.freelancer_id = f.id GROUP BY f.id, f.first_name, f.last_name HAVING (AVG (h.hours) - (SELECT AVG (hours) FROM hours_worked) > 1); This flamboyant code selects the columns id, first_name, and last_name from the table freelancer. Then it calculates the average daily hours worked by the freelancer, with the values shown in the column daily_avg_freelancer. After that comes the subquery, which calculates the total average daily hours worked. It again uses the AVG() function, with values in the column daily_avg_total. Now, I have to calculate the difference between those two values, which means I have to write the same subquery again: AVG (h.hours) - (SELECT AVG (hours) FROM hours_worked) AS diff. Since I need the data by the freelancer, the result has to be grouped by the id, first_name, and last_name from the table freelancer. Finally, I have to set up a condition that will return only freelancers who worked at least one hour above the total average. Following the HAVING clause, there it is, the same old subquery showing up for the third time. It would be nice if, instead of repeating the same subquery, I could write something like AVG (h.hours) - daily_avg_freelancer, i.e., using the subquery’s name every time I need it. Unfortunately, SQL doesn’t work that way. But is there a solution? Yes, you’ve guessed it: CTE. Take a look at this! It’s the same code as above. Except now, it’s written using CTEs: WITH total_average AS ( SELECT AVG (hours) AS daily_avg_total FROM hours_worked ), avg_per_freelancer AS ( SELECT f.id, f.first_name, f.last_name, AVG (h.hours) AS daily_avg_freelancer FROM hours_worked h JOIN freelancer f ON h.freelancer_id = f.id GROUP BY f.id, f.first_name, f.last_name ) SELECT *, (daily_avg_freelancer - daily_avg_total) AS diff FROM avg_per_freelancer, total_average WHERE (daily_avg_freelancer - daily_avg_total) > 1; Using the WITH statement, I’ve created the total_average CTE. It’s a simple SELECT statement that calculates the total daily average hours worked. Then comes the second CTE, and there’s no WITH statement. Remember, when writing two or more CTEs, you only write WITH before the first one. All the subsequent CTEs are separated only by a comma. This second CTE is named avg_per_freelancer. It does exactly what it says. It calculates the average daily hours worked per freelancer. Not much is different compared to the original code written using subqueries. What is different is the CTE does calculations, and the code looks tidier since the calculation steps are organized in separate parts of the code. Now we only need to calculate the difference. This means subtracting one CTE from the other. Here comes the magical part! There’s no need to write the CTEs all over again like you had to in the case of the subqueries. You simply refer to the CTEs by their name and select the columns you want. In this case, it is (daily_avg_freelancer - daily_avg_total) AS diff. The SELECT statement also takes all the values from both CTEs. You treat them like any other table in the FROM clause. The condition is stated using the WHERE clause. Again, you don’t have to write the same CTEs you’ve already written. Just use the CTEs one more time: WHERE (daily_avg_freelancer - daily_avg_total) > 1. Running the query will return the table below: idfirst_namelast_namedaily_avg_freelancerdaily_avg_totaldiff 2ArthurStrauss10.3333336.6333333.7 6DonJones86.6333331.366667 8TrentArthurs126.6333335.366667 12Jean-PaulLamontagne86.6333331.366667 16MohamedLambert8.6666666.6333332.033333 18GaspardChevalier106.6333333.366667 19AnoukDe Jong126.6333335.366667 22CaetanoVan de Berg7.6666666.6333331.033333 26HadwinBakker7.8333336.6333331.2 27LivMeijer8.56.6333331.866667 To learn even more about how to organize your queries, you can read this article. If you’re looking for more examples for practice, you can find several here. Replace a Subquery in the FROM Clause You’ve probably seen it, and maybe you’ve used it. Writing a subquery in the FROM clause is also called a derived table in SQL Server and Oracle. When using a subquery in the FROM clause, it has to have a name. For instance, the code could look something like this: SELECT hs.first_name, hs.last_name, hs.pay_rate * hs.hours_per_freelancer AS monthly_pay FROM ( SELECT f.id, f.first_name, f.last_name, f.pay_rate, SUM (h.hours) AS hours_per_freelancer FROM freelancer f JOIN hours_worked h ON f.id = h.freelancer_id GROUP BY f.id, f.first_name, f.last_name, f.pay_rate ) AS hs; This query calculates the total monthly pay for every freelancer based on their hourly rate and hours charged. It seems a rather complicated query for such a simple calculation. Let’s analyze it, but this time, let’s start from the subquery in the FROM clause. The subquery selects the columns id, first_name, last_name, and pay_rate from the table freelancer. It also sums the hours worked in the new column hours_per_freelancer, taking data from the table hours_worked. The tables are, of course, joined to get the result. This result has to be by the freelancer. That’s why it is grouped by the id, first_name, and last_name columns. I’ve named the subquery hs. The outer query uses the subquery as a table. It selects the needed columns and multiplies the pay rate with the hours worked by the freelancer, with the result shown in the column monthly_pay. If you use a subquery in the FROM clause, it’s good to replace it with a CTE. Your code will be much shorter, less complicated, and much more transparent. In that case, the code will be: WITH hours_charged AS ( SELECT f.id, SUM (h.hours) AS hours_per_freelancer FROM freelancer f JOIN hours_worked h ON f.id = h.freelancer_id GROUP BY f.id ) SELECT f.first_name, f.last_name, f.pay_rate * hc.hours_per_freelancer AS monthly_pay FROM freelancer f JOIN hours_charged hc ON f.id = hc.id; The CTE is named hours_charged. It selects the column id from the table freelancer. It also calculates the total hours worked by using the SUM() function. The result will show up in the column hours_per_freelancer. Since I want the result by the freelancer, it is grouped by the column id from the table freelancer. Once the CTE is defined, I can use it in the SELECT statement. This one simply selects the first and last name of the freelancer. It also returns the monthly pay as a result of this code line: f.pay_rate * hc.hours_per_freelancer AS monthly_pay. This is the result of both queries: first_namelast_namemonthly_pay CaroleQueen1020 ArthurStrauss1302 CindyJones1425 PaulHopkins102 KatherineMcLean640 DonJones444 LindaStones121.5 TrentArthurs300 MiriamJones572 GuyGudbois2052 MichelleLebouef480 Jean-PaulLamontagne624 CamilleBarbet96 JulesFournier446.4 PauletteDubois128 MohamedLambert988 SimonePomeroy440.4 GaspardChevalier1530 AnoukDe Jong10560 AmbroosJansen1275 HendrikaDe Vries451 CaetanoVan de Berg1943.5 GrietjeVan Dijk166.5 DaelJanssen252 UdayVisser168 HadwinBakker282 LivMeijer1122 Now that you have started to see the benefits of CTEs, here’s an article that analyzes the differences between CTEs and subqueries. Best Practices for Naming CTEs Even though one of the benefits of using CTEs is making your code more organized and readable, it’s also up to you to keep your code neat. The best way to do that is by paying attention to how you name your CTEs. Here are some pieces of advice that I apply when naming CTEs. I recommend that you do the same. Give Meaningful Names to Your CTEs This is important because you’ll reference CTEs by their name. It is a nice feature that makes CTEs very helpful in writing code. Don’t mess it up by giving them names that are not related to what you’re doing. If you do, you’ll have no idea what a particular CTE does. You’ll have to analyze it every time you want to reference it somewhere. I’m guessing you have many more exciting things to do in your life. Take, for instance, one of the codes above, the one containing two CTEs. Just by looking at the names of the CTEs, you’ll know or assume, at least, what each of those CTEs does, even if you’re seeing this code for the first time. WITH total_average AS ( SELECT AVG (hours) AS daily_avg_total FROM hours_worked ), avg_per_freelancer AS ( SELECT f.id, f.first_name, f.last_name, AVG (h.hours) AS daily_avg_freelancer FROM hours_worked h JOIN freelancer f ON h.freelancer_id = f.id GROUP BY f.id, f.first_name, f.last_name ) SELECT *, (daily_avg_freelancer - daily_avg_total) AS diff FROM avg_per_freelancer, total_average WHERE (daily_avg_freelancer - daily_avg_total) > 1; Imagine I’ve decided to name those CTEs as a and b. Can you imagine the mess? If you can’t, here, let me show you: WITH a AS ( SELECT AVG (hours) AS daily_avg_total FROM hours_worked ), b AS ( SELECT f.id, f.first_name, f.last_name, AVG (h.hours) AS daily_avg_freelancer FROM hours_worked h JOIN freelancer f ON h.freelancer_id = f.id GROUP BY f.id, f.first_name, f.last_name ) Do you know what each CTE does? Me neither. Look at the query using those CTEs: SELECT *, (daily_avg_freelancer - daily_avg_total) AS diff FROM b, a WHERE (daily_avg_freelancer - daily_avg_total) > 1; Such a simple SELECT statement made incomprehensible by choosing the wrong names. What is a, and what is b in this statement? Good luck with analyzing it! Be Consistent When Naming CTEs You’ve probably noticed that all my CTEs are written in lower case. If the name consists of two or more words, I separate them by using the underscore. It doesn’t mean you have to do the same. You can use only uppercase if you wish to. You can separate the words by starting every word with the capital letter. The only thing I want you to remember is to choose one way of naming CTEs and stick to it. Please, please, please don’t do something like this: WITH Total_AveRage AS (...) ) AVGPERfreelancer AS (...) SELECT *, (daily_avg_freelancer - daily_avg_total) AS diff FROM TOTAL_AVERAGE, avgperfreelancer WHERE (daily_avg_freelancer - daily_avg_total) > 1; It would hurt my brain if it were medically possible. Yours, too, trust me. Apply the Same Naming Convention to Both Tables and CTEs Several times in this article, I’ve mentioned you should treat CTEs as any other table. This also applies to naming them. Once you’ve chosen a naming convention and decided to stick to it, apply it to both CTEs and tables. Lines in the code are like sentences. You want to be able to read your thoughts and allow others to do the same. This also means keeping your lines of code unified by sticking to the chosen naming convention. Otherwise, IT woulD be much_harder FORYOU to read YOUR_OWNCode. For example, you’ll see that I write all the CTE and table names in lowercase. Also, all the words in the name are separated by an underscore. avg_per_freelancer AS ( SELECT f.id, f.first_name, f.last_name, AVG (h.hours) AS daily_avg_freelancer FROM hours_worked h JOIN freelancer f ON h.freelancer_id = f.id GROUP BY f.id, f.first_name, f.last_name If you’re not consistent in naming, your code could look like this: avgperfreelancer AS ( SELECT f.id, f.first_name, f.last_name, AVG (h.hours) AS daily_avg_freelancer FROM hours_worked h JOIN Freelancer f ON h.freelancer_id = f.id GROUP BY f.id, f.first_name, f.last_name Even though your WITH query has a meaningful name, it’s tough to read it and realize it’s meaningful. It also doesn’t look nice if all the other tables are named using different conventions, e.g., hours_worked. You Can’t Avoid CTEs When Using Recursive Queries Those of you who already use the recursive queries know that. You can’t write a recursive query without a CTE. I don’t mean you can’t, as in you shouldn’t. No, you can’t do that without a CTE. If you’re not that advanced yet, you’re maybe wondering what a recursive query is. Recursive queries reference themselves. By doing so, they return the sub-result and repeat the process until they return the final result. I just wanted to mention this and won’t go through recursive queries with you. I’ll gladly leave you to my colleagues though. They can unlock the power of recursive queries for you. If you’re using PostgreSQL or Oracle, this article will lead you through recursive queries. The Benefits of Using CTEs With several examples, I’ve tried to show you the differences between a CTE and a subquery. By doing so, I hope I was able to show you when it’s good to use CTEs instead. Whenever you decide to use CTEs, it’s essential to have in mind several naming conventions that could make writing them even easier. While not minding those conventions won’t affect the performance of the code itself, it could diminish one of the benefits of using CTEs, improving the code organization and readability. Last but not least, if you’re interested in learning recursive queries, knowing CTEs is a must. If you liked what you read here, feel free to learn even more in our Recursive Queries course. This article will give you all the info you need to find out what you can learn there. Feel free to use the comment section to let me know how this article helped you or how I could’ve done better. Tags: sql learn sql CTE Recursive Queries