10th Oct 2017 5 minutes read Long SQL Query vs. Recursive SQL Query Aldo Zelen common table expressions CTE recursive queries SQL basics Table of Contents RECURSION WITH CTE Recursion is one of the central ideas in computer science. We can define it as a method for solving problems where the solution of the problem depends on solving a smaller instance of a problem. If this sounds complicated do not fret, in this article we will learn about recursion in SQL that you can practice and deepen at LearnSQL.com. Recursion is a way of solving hierarchical problems we find in data with common SQL. These types of queries are also called hierarchical queries. We can find recursion capability in standard SQL since SQL:1999 by way of recursive CTE's or common table expressions. Some RDMBS systems have their own way of implementing recursion, most notably Oracle's databases with CONNECT BY statement. Since recursive CTE is in the SQL standard and it is shared with all major RDBMS vendors we will explore this type of recursion. RECURSION WITH CTE Recursion is best mastered with viewing some hierarchical structure. There is no better example of hierarchical data than organization structure of a large enterprise. So we will explore a typical employees table that contains data about employees and there direct superiors. The table has identifier of the current employee and his direct superior as a reference to the same table. Besides the identifiers we also have in the table name and surname of the employee. We will construct a query that searches through all the rows in the table, starting from the first row usually called the anchor row. In our table the anchor row is the top manager, he has no reporting manager in the hierarchy above him so his manager_id attribute is null. SELECT id, manager_id, first_name, last_name, 0 depth_level FROM employees WHERE manager_id IS NULL ID MANAGER_ID FIRST_NAME LAST_NAME 1 John McGee Let's say that we want to see who John manages, what would the query look like? Something like this: SELECT id, manager_id, first_name, last_name FROM employees cur WHERE manager_id in ( SELECT id FROM employees WHERE manager_id IS NULL ) And for managers of those managers: SELECT id, manager_id, first_name, last_name FROM employees WHERE manager_id IN (SELECT id FROM employees WHERE manager_id IN ( SELECT id FROM employees WHERE manager_id IS NULL ) ) As you see there is a patter emerging there for every new management level we construct a new subquey. This approach is bad as it takes into account a fixed number of levels. Recursion is a way in which we take these subqueries and transform them so that they are general in a way that they represent the previous result of the query. In our management example the general part is constructed so that we JOIN previous result set to the current one based on the management id. SELECT cur.id, cur.manager_id, cur.first_name, cur.last_name FROM employees cur, previous WHERE cur.manager_id = previous.id This previous dataset is defined as a CTE. So the full recursive function looks like this : WITH previous(id, manager_id,first_name,last_name) AS ( SELECT id, manager_id, first_name, last_name FROM employees WHERE manager_id IS NULL UNION ALL SELECT cur.id, cur.manager_id, cur.first_name, cur.last_name FROM employees cur, previous WHERE cur.manager_id = previous.id ) SELECT * FROM previous; The recursion starts with the top manager and is joined by every new level in management hierarchy. The final SELECT returns the whole dataset. ID MANAGER_ID FIRST_NAME LAST_NAME 1 John McGee 2 1 Kate Doe 7 1 Ethan Lee 9 1 Emily McPers 3 2 Ethan Smith 4 2 Alexander Lam 8 7 Sophia Wilson 10 9 Jacob Gagnon 12 9 Madison Morin 5 4 Ava Marin 6 4 Olivia Roy 11 10 Logan Tremblay We can expand this query to make it more useful, let's say that we want to see the levels in hierarchy. We do this by constructing a new parameter that we increment, let's call it depth_level. For every level of the hierarchy the number is increased by 1. WITH previous(id, manager_id,first_name,last_name,depth_level) AS ( SELECT id, manager_id, first_name, last_name, 0 depth_level FROM employees WHERE manager_id IS NULL UNION ALL SELECT cur.id, cur.manager_id, cur.first_name, cur.last_name, previous.depth_level+1 depth_level FROM employees cur, previous WHERE cur.manager_id = previous.id ) SELECT previous.* FROM previous; So we get the levels of the hierarchy. ID MANAGER_ID FIRST_NAME LAST_NAME DEPTH_LEVEL 1 John McGee 0 2 1 Kate Doe 1 7 1 Ethan Lee 1 9 1 Emily McPers 1 3 2 Ethan Smith 2 4 2 Alexander Lam 2 8 7 Sophia Wilson 2 10 9 Jacob Gagnon 2 12 9 Madison Morin 2 5 4 Ava Marin 3 6 4 Olivia Roy 3 11 10 Logan Tremblay 3 We can use this depth_level to represent the data in a more graphical way with the query WITH previous(id, manager_id,first_name,last_name,depth_level) AS ( SELECT id, manager_id, first_name, last_name, 0 depth_level FROM employees WHERE manager_id IS NULL UNION ALL SELECT cur.id, cur.manager_id, cur.first_name, cur.last_name, previous.depth_level+1 depth_level FROM employees cur, previous WHERE cur.manager_id = previous.id ) SELECT previous.*, RPAD('.', (depth_level)*2, '.') || id AS tree FROM previous; and the result set: ID MANAGER_ID FIRST_NAME LAST_NAME DEPTH_LEVEL TREE 1 John McGee 0 1 2 1 Kate Doe 1 ..2 7 1 Ethan Lee 1 ..7 9 1 Emily McPers 1 ..9 3 2 Ethan Smith 2 ....3 4 2 Alexander Lam 2 ....4 8 7 Sophia Wilson 2 ....8 10 9 Jacob Gagnon 2 ....10 12 9 Madison Morin 2 ....12 5 4 Ava Marin 3 ......5 6 4 Olivia Roy 3 ......6 11 10 Logan Tremblay 3 ......11 Recursion is not the most intuitive part of computer science but it is integral. The best way to learn recursion is by diligent and persistent practice. There is no better place to practice SQL than on LearnSQL.com. So open your browser go through the examples in the Recursive Queries course and you will be on your way to SQL mastery. Tags: common table expressions CTE recursive queries SQL basics