13th Dec 2016 6 minutes read Correlated Subquery in SQL: A Beginner’s Guide Maria Alcaraz subqueries Table of Contents A Simple Subquery Using a Correlated Subquery in SQL How Many Times is a SQL Correlated Subquery Executed? When to Use a Correlated Subquery in SQL Using Correlated Subqueries In UPDATE or DELETE Statement Try It Yourself Sometimes, using a SQL correlated subquery is the only way to solve a statement. But these subqueries can be very slow. In this post, we’ll talk about when to use a correlated subquery, why, and how to do it. Subqueries are an important resource for increasing the expressive power of SQL. If you haven’t read our previous article, subqueries are simply a SELECT statement inside another SELECT. We can use them in different places inside a SELECT, such as in the WHERE, HAVING, or FROM clauses. We can even use them in other statements like UPDATE or DELETE. Like simple subqueries, a SQL correlated subquery contains a query within a query. It gets its name because the two queries are related; the inner query uses information obtained from the outer query (e.g. from a table referenced in the outer query). For the programmers among you, this is similar to a nested loop structure. Before we proceed to an example of a correlated subquery in SQL, let’s have a quick refresher on simple (non-correlated) subqueries. A Simple Subquery The following SQL example uses a simple subquery to obtain a list of employees who earn more than the average company salary. In this case, the subquery is shown in bolded text, and its result is the company’s average salary. The outer query compares each employee’s salary with the company average. Easy! SELECT lastname, firstname, salary FROM employee WHERE salary > (SELECT avg(salary) FROM employee) Example 1: A simple non-correlated SQL subquery Next up are correlated subqueries. These are very special, because sometimes they are the only way to solve a query. However, think twice before using a correlated subquery in SQL. They can be slow, as we will explain later. Using a Correlated Subquery in SQL Let’s begin with an example of a correlated subquery in SQL. Suppose we want to find all employees with a salary higher than their average departmental salary. We would use the following query. Once again, I’ve bolded the subquery: SELECT lastname, firstname, salary FROM employee e1 WHERE e1.salary > (SELECT avg(salary) FROM employee e2 WHERE e2.dept_id = e1.dept_id) Example 2: A correlated subquery in SQL The main difference between a SQL correlated subquery and a simple subquery is that correlated subqueries reference columns from the outer table. In the above example, e1.dept_id iis a reference to the outer subquery table. To identify a correlated query, just look for these kinds of references. If you find at least one, you have a SQL correlated subquery! Let’s look at another example. Suppose we want to obtain the names of departments that have more than 10 employees. We can use the following SQL correlated subquery: SELECT deptname FROM department d1 WHERE 10 < (SELECT count(*) FROM employee e WHERE e.dept_id = d1.dept_id) Example 3: Another correlated subquery in SQL Time for one last example. We mentioned that subqueries can be part of the WHERE, FROM, HAVING and SELECT clauses. In this example, we will use a SQL correlated subquery in the SELECT list to find every employee name, their salary, and the average salary of their department. We will obtain the average salary using a correlated subquery inside the SELECT. Here’s the code: SELECT lastname, firstname, salary, (SELECT avg(salary) FROM employee e2 WHERE e2.dep_id = e1.dep_id) AS avg_dept_salary FROM employee e1 Example 4: A SQL correlated subquery in the SELECT list How Many Times is a SQL Correlated Subquery Executed? Suppose we have a table called “assigned_to_project” that stores the names of employees assigned to projects. We want to find all employees who are not assigned to any projects. The solution is the following query: SELECT lastname, firstname, salary FROM employee e1 WHERE NOT EXISTS (SELECT project_id FROM assigned_to_project WHERE employee_id = e1.employee_id) Example 5: A correlated subquery in SQL First of all, this query is easy to analyze. The NOT EXISTS clause is TRUE when the subquery returns an empty result set. This happens only for those employees not assigned to any project. Again, quite easy! However, the purpose of this section is to analyze how many times the SQL correlated subquery is executed. Try and figure out which of the following statements is right: It is executed just once. It is executed once for each project. It is executed once for each employee not assigned to any project. It is executed once for each employee in the company. Let’s analyze these options. Suppose we have 1,000 employees and 20 projects. Further, we have 800 employees who are already assigned to a project. In this case, the first option has one execution, the second has 20 executions, the third has 200, and the final option has 1,000. If you said that the correct answer was “once for each employee in the company”. you’re correct. If we go through the statement, this answer becomes obvious; we need to check if each employee has projects or not. According to the above example, though, this means that the subquery will be executed 1,000 times! In terms of performance, this is the worst scenario of the four. Because correlated subqueries tend to imply many executions, they also tend to be slow. As a rule, because of this, we always try to avoid using a correlated subquery in SQL. But as we already mentioned, sometimes correlation is the only way to solve a query. When to Use a Correlated Subquery in SQL At this point of the article the reader should be thinking: So when to use a correlated subquery in SQL?? Actually, there are some cases when we must. This is particularly true in queries where we’re looking for what we might call negatives. Here’s an example of a ‘negative’ query. Let’s assume we have a payment history table with a column called payment_type that indicates if a payment is regular salary, a bonus, or an award. If we want a query to return the employees who never received an award, we would use this query: SELECT lastname, firstname FROM employees e1 WHERE NOT EXISTS (SELECT ph.lastname FROM payment_history ph WHERE ph.emp_id = e1.employee_id AND ph.payment_type =’award’) Example 6: A SQL correlated subquery using NOT EXISTS EXISTS is an unary operator. It has only one operand, which is a subquery (correlated or not). If the subquery returns at least one record, then EXISTS returns TRUE. If the subquery returns no records, EXISTS returns FALSE. In this case, you must use a correlated subquery to get your results. Using Correlated Subqueries In UPDATE or DELETE Statement Sometimes we find correlated subqueries in UPDATE or DELETE statements. The following UPDATE has a SQL correlated subquery that obtains the new value of the all_money_made column: UPDATE employee emp SET all_money_made = (SELECT SUM(payment) FROM payment_history WHERE employee_id = emp.emp_id) Example 7: A SQL correlated subquery in an UPDATE This article has shown us when to use a correlated subquery in SQL. Queries looking for negatives are good candidates, although there are other occasions when a correlation is the only real option. We’ve also seen how many times a correlated subquery is executed – usually many, many times. This is their biggest drawback. Try It Yourself Correlated subqueries are an important resource for the SQL developer. To learn more and improve your subquery skills, try LearnSQL.com’s SQL Basics course. We have a specific section for subqueries, plus plenty of exercises and examples! Tags: subqueries