10th Apr 2018 5 minutes read Learn to Write a SQL Correlated Subquery in 5 Minutes Ignacio L. Bisso data analysis how to SQL basics subqueries Table of Contents What Exactly is a SQL Correlated Subquery? When Should You Use a SQL Correlated Subquery? SQL Correlated Subqueries Vs. Simple Subqueries Another Correlated Subquery Example Enough Negativity. What About Positive Data Questions? Warning: Your SQL Correlated Subquery Is Likely to Be Slow Try a SQL Correlated Subquery Yourself! If you’re familiar with the famous Russian nesting doll, then SQL correlated subqueries should be a peace of cake to understand—subqueries are just queries nested within queries. An SQL subquery is often called an “inner” query; the main query is usually called the “outer” query. This article covers everything you need to know about correlated subqueries. What Exactly is a SQL Correlated Subquery? A correlated SQL subquery is just a subquery that is executed many times—once for each record (row) returned by the outer (main) query. In other words, the outer query returns a table with multiple rows; the inner query then runs once for each of those rows. If your outer query returns 10 rows, then the inner query will run 10 times. And if your outer query returns 100 rows, the inner query will run 100 times. When Should You Use a SQL Correlated Subquery? Some data questions can only be answered with correlated subqueries. This is particularly true when asking negative data questions. Negative data questions arise when we search for records that don’t satisfy a particular condition. An example of a simple negative data question is: “Obtain the names of all movies not produced by Steven Spielberg.” We’ll solve an example of a negative data question in this article. Before we do so, let's first review the tables involved. We have two tables: the employee table and the payment_history table. The second table has a column named payment_type that denotes whether a payment is part of a regular salary, a bonus, or an award. Let's see some sample data from the tables we will use: employee employee_id payment_type amount_paid payment_date 100 salary 2000.00 2018-Mar-02 101 salary 1800.00 2018-Mar-02 102 salary 1900.00 2018-Mar-02 101 award 500.00 2018-Mar-08 102 adjustment 124.70 2018-Mar-10 payment_history employee_id payment_type amount_paid payment_date 100 salary 2000.00 2018-Mar-02 101 salary 1800.00 2018-Mar-02 102 salary 1900.00 2018-Mar-02 101 award 500.00 2018-Mar-08 102 adjustment 124.70 2018-Mar-10 Here’s the negative data question we’d like to answer: “Obtain the names of employees who never received an award” The SQL query that answers the previous question is the following: SELECT last_name, first_name FROM employee e1 WHERE NOT EXISTS (SELECT ph.last_name FROM payment_history ph WHERE ph.employee_id = e1.employee_id AND ph.payment_type = 'award') SQL Correlated Subqueries Vs. Simple Subqueries The main difference between a SQL correlated subquery and a simple subquery is that a SQL correlated subquery references columns from the table of the outer query. In the above example, ph.employee_id = e1.employee_id is a reference to the outer subquery table (e1). To identify a correlated subquery, just look for these kinds of references. If you find at least one, you have a correlated subquery! The negative part of a data question is often solved in a SQL correlated subquery by using the NOT EXISTS operator in the WHERE clause. EXISTS is an operator always followed by a subquery. If the subquery returns at least one record, then EXISTS evaluates to TRUE. If the subquery returns an empty set, then EXISTS evaluates to FALSE. Note that we use NOT EXISTS, which is simply the opposite of EXISTS. The result of the previous query is: first_name last_name John Smith Alice Johnson Another Correlated Subquery Example In this example, we’ll try to obtain the names of all employees who earned higher salaries in March 2018 than their average monthly salaries for all previous months using a SQL correlated subquery. Here’s the query we’ll run: SELECT first_name, last_name FROM employee e1, payment_history ph WHERE e1.employee_id = ph.employee_id AND amount_paid > = ( SELECT AVG(amount_paid) FROM payment_history ph2 WHERE ph2.employee_id = e1.employee_id AND ph2.payment_date < '01/03/2018' AND ph2.payment_type = 'salary' ) AND month(ph.payment_date) =3 AND year(ph.payment_date) = 2018 AND ph.payment_type ='salary' Enough Negativity. What About Positive Data Questions? Must we use a SQL correlated subquery to answer a positive data question? No, you don’t have to. You can still do so if you want to, though. For positive questions, we can usually just use a JOIN condition or a relationship between two tables. Let's change our previous question to a positive one and solve it with a JOIN instead a correlated subquery. The question becomes: “Obtain the names of employees who received award payments.” And the SQL query (without correlated subqueries) that answers this question is: SELECT first_name, last_name FROM employee e1 JOIN payment_history ph ON ph.employee_id = e1.employee_id WHERE ph.payment_type =award' The result is: first_name last_name Kate Miller Warning: Your SQL Correlated Subquery Is Likely to Be Slow I’d just like to mention that we try not to overuse SQL correlated subqueries, if possible. Recall that a correlated subquery runs once for each record returned by the outer query. If the outer query returns thousands upon thousands of records, you can imagine how quickly your query will slow down in performance. In general, you should only use a SQL correlated subquery if it’s absolutely necessary. Try a SQL Correlated Subquery Yourself! We’ve seen that correlated subqueries are an important part of the SQL language and can help us answer different data questions, especially negative ones. We also explained how to recognize a SQL correlated subquery and why we should generally try to avoid correlated subqueries, if possible, due to performance reasons. To improve your subquery skills, try LearnSQL.com’s SQL Basics course. We have a specific section devoted to subqueries, with plenty of practical exercises and correlated subqueries examples to help you master the content. Tags: data analysis how to SQL basics subqueries