10th Aug 2023 24 minutes read Advanced SQL Practice: 10 Exercises with Solutions Martyna Sławińska sql sql practice Table of Contents Practicing Your Way to SQL Proficiency Advanced SQL Practice Exercises Section 1: Advanced SQL JOIN Exercises Exercise 1: List All Clothing Items Exercise 2: Get All Non-Buying Customers Exercise 3: Select All Main Categories and Their Subcategories Section 2: Advanced GROUP BY Exercises Exercise 4: Organize Runners Into Groups Exercise 5: How Many Runners Participate in Each Event Exercise 6: Group Runners by Main Distance and Age Section 3: Advanced Window Functions Exercises Exercise 7: List the Top 3 Most Expensive Orders Exercise 8: Compute Deltas Between Consecutive Orders Exercise 9: Compute the Running Total of Purchases per Customer Section 4: Advanced Recursive Query Exercises Exercise 10: Find the Invitation Path for Each Student Advancing One Query at a Time As SQL proficiency continues to be in high demand for data professionals and developers alike, the importance of hands-on practice cannot be emphasized enough. Read on to delve into the world of advanced SQL and engage in practical exercises to enhance your skills. This article provides you with a collection of ten challenging SQL practice exercises specifically for those seeking to enhance their advanced SQL skills. The exercises cover a selection of SQL concepts and will help you refresh your advanced SQL knowledge. Each exercise is accompanied by a detailed solution, allowing you to test your knowledge and gain a deeper understanding of complex SQL concepts. The exercises come from our advanced SQL practice courses. If you want to see more exercises like this, check out these courses: Window Functions Practice Set 2021 Monthly SQL Practice Sets - Advanced 2022 Monthly SQL Practice Sets - Advanced Find out how you can practice advanced SQL with our platform. Let’s get started. Practicing Your Way to SQL Proficiency Practice is an integral component in mastering SQL; its importance cannot be overstated. The journey to becoming proficient in advanced SQL requires dedication, perseverance, and a strong commitment to continuous practice. By engaging in regular advanced SQL practice, individuals can sharpen their skills, expand their knowledge, and develop a deep understanding of the intricacies of data management and manipulation. Advanced SQL exercises serve as invaluable tools, challenging learners to apply their theoretical knowledge in practical scenarios and further solidifying their understanding of complex concepts. With each session of dedicated SQL practice, you can discover efficient techniques and gain the confidence needed to tackle real-world data challenges. Let’s go over the exercises and their solutions. Advanced SQL Practice Exercises We’ll present various advanced SQL exercises that cover window functions, JOINs, GROUP BY, common table expressions (CTEs), and more. Section 1: Advanced SQL JOIN Exercises In the following advanced SQL exercises, we’ll use a sportswear database that stores information about clothes, clothing categories, colors, customers, and orders. It contains five tables: color, customer, category, clothing, and clothing_order. Let's look at the data in this database. The color table contains the following columns: idstores the unique ID for each color. name stores the name of the color. extra_fee stores the extra charge (if any) added for clothing ordered in this color. In the customer table, you'll find the following columns: id stores customer IDs. first_name stores the customer's first name. last_name stores the customer's last name. favorite_color_idstores the ID of the customer's favorite color (references the color table). The category table contains these columns: id stores the unique ID for each category. name stores the name of the category. parent_id stores the ID of the main category for this category (if it's a subcategory). If this value is NULL, it denotes that this category is a main category. Note: Values are related to those in the id column in this table. The clothing table stores data in the following columns: id stores the unique ID for each item. name stores the name of that item. size stores the size of that clothing: S, M, L, XL, 2XL, or 3XL. price stores the item's price. color_id stores the item's color (references the color table). category_id stores the item's category (references the category table). The clothing_order table contains the following columns: id stores the unique order ID. customer_id stores the ID of the customer ordering the clothes (references the customer table). clothing_id stores the ID of the item ordered (references the clothing table). items stores how many of that clothing item the customer ordered. order_date stores the date of the order. Let’s do some advanced SQL exercises that focus on JOINs. Exercise 1: List All Clothing Items Exercise: Display the name of clothing items (name the column clothes), their color (name the column color), and the last name and first name of the customer(s) who bought this apparel in their favorite color. Sort rows according to color, in ascending order. Solution: SELECT cl.name AS clothes, col.name AS color, cus.last_name, cus.first_name FROM clothing_order co JOIN clothing cl ON cl.id = co.clothing_id JOIN color col ON col.id = cl.color_id JOIN customer c ON cus.id = co.customer_id WHERE cus.favorite_color_id = cl.color_id ORDER BY col.name; Solution explanation: We want to display the column values from three different tables (clothing, color, and customer), including information on which customer ordered a certain item (from the clothing_order table). Therefore, we need to join these four tables on their common columns. First, we select from the clothing_order table (aliased as co) and join it with the clothing table (aliased as cl). We join the tables using the primary key column of the clothing table (id) and the foreign key column of the clothing_order table (clothing_id); this foreign key column links the clothing and clothing_order tables. Next, we join the color table (aliased as col) with the clothing table (aliased as cl). Here we use the primary key column of the color table (id) and the foreign key column of the clothing table (color_id). Finally, we join the customer table (aliased as cus) with the clothing_order table (aliased as co). The foreign key of the clothing_order table (customer_id) links to the primary key of the customer table (id). The ON clause stores the condition for the JOIN statement. For example, an item from the clothing table with an id of 23 is joined with an order from the clothing_order table where the clothing_id value equals 23. Follow this article to see more examples on JOINing three (or more) tables. And here is how to LEFT JOIN multiple tables. Exercise 2: Get All Non-Buying Customers Exercise: Select the last name and first name of customers and the name of their favorite color for customers with no purchases. Solution: SELECT cus.last_name, cus.first_name, col.name FROM customer cus JOIN color col ON col.id = cus.favorite_color_id LEFT JOIN clothing_order o ON o.customer_id = cus.id WHERE o.customer_id IS NULL; Solution explanation: Here we need to display customers’ first and last names from the customer table and their favorite color name from the color table. We must do it only for customers who haven’t placed any orders yet; therefore, we require information from the clothing_order table. So the next step is to join these three tables. First, we join the customer table (aliased as cus) with the color table (aliased as col). To do that, we use the following condition: the primary key column of the color table (id) must be equal to the foreign key column of the customer table (favorite_color_id). That lets us select the favorite color name instead of its ID. Here is how to ensure that we list only customers who haven’t placed any orders yet: We LEFT JOIN the clothing_order table (aliased as o) with the customer table (aliased as cus) to ensure that all rows from the customer table (even the ones with no match) are listed. In the WHERE clause, we define a condition to display only the rows with the customer_id column from the clothing_order table equal to NULL (meaning only the customers whose IDs are not in the clothing_order table will be returned). There are different types of JOINs, including INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN. You can learn more by following the linked articles. Exercise 3: Select All Main Categories and Their Subcategories Exercise: Select the name of the main categories (which have a NULL in the parent_id column) and the name of their direct subcategory (if one exists). Name the first column category and the second column subcategory. Solution: SELECT c1.name AS category, c2.name AS subcategory FROM category c1 JOIN category c2 ON c2.parent_id = c1.id WHERE c1.parent_id IS NULL; Solution explanation: Each category listed in the category table has its own ID (stored in the id column); some also have the ID of their parent category (stored in the parent_id column). Thus, we can link the category table with itself to list main categories and their subcategories. The kind of JOIN where we join a table to itself is colloquially called a self join. When you join a table to itself, you must give different alias names to each copy of the table. Here we have one category table aliased as c1 and another category table aliased as c2. We select the name from the category table (aliased as c1) and ensure that we list only main categories by having its parent_id column equal to NULL in the WHERE clause. Next, we join the category table (aliased as c1) with the category table (aliased as c2). The latter one provides subcategories for the main categories. Therefore, in the ON clause, we define that the parent_id column of c2 must be equal to the id column of c1. Read this article to learn more about self joins. The exercises in this section have been taken from our course 2021 Monthly SQL Practice Sets - Advanced. Every month we publish a new SQL practice course in our Monthly SQL Practice track; every odd-numbered month, the course is at an advanced level. The advanced SQL practice courses from 2021 have been collected in our 2021 Monthly SQL Practice Sets - Advanced course. Check it out to find more JOIN exercises and other advanced SQL challenges. Section 2: Advanced GROUP BY Exercises In the following advanced SQL exercises, we’ll use a sportsclub database that stores information about runners and running events. It contains three tables: runner, event, and runner_event. Let's look at the data in this database. The runner table contains the following columns: id stores the unique ID of the runner. name stores the runner's name. main_distance stores the distance (in meters) that the runner runs during events. age stores the runner's age. is_female indicates if the runner is male or female. The event table contains the following columns: id stores the unique ID of the event. name stores the name of the event (e.g. London Marathon, Warsaw Runs, or New Year Run). start_date stores the date of the event. city stores the city where the event takes place. The runner_event table contains the following columns: runner_id stores the ID of the runner. event_id stores the ID of the event. Let’s do some advanced SQL exercises that focus on GROUP BY. Exercise 4: Organize Runners Into Groups Exercise: Select the main distance and the number of runners that run the given distance (runners_number). Display only those rows where the number of runners is greater than 3. Solution: SELECT main_distance, COUNT(*) AS runners_number FROM runner GROUP BY main_distance HAVING COUNT(*) > 3; Solution explanation: We want to get the count of runners for each distance that they run. To do that, we need to group all runners by distance and use the COUNT() aggregate function to calculate how many runners are in each distance group. We select the main_distance column and GROUP BY this column. Now when we use the COUNT() aggregate functions, it is going to give us the number of runners that match each main_distance value. The GROUP BY clause is used to group rows from a table based on one or more columns. It divides the result set into subsets or groups, where each group shares the same values in the specified column(s). This allows us to perform aggregate functions (such as SUM(), COUNT(), AVG(), etc.) on each group separately. Here are the most common GROUP BY interview questions. To display only the groups with more than three runners, we use a HAVING clause that filters the values returned by the COUNT() aggregate function. The HAVING clause is often used together with the GROUP BY clause to filter the grouped data based on specific conditions. It works similarly to the WHERE clause, but it operates on the grouped data rather than individual rows. Check out this article to learn more about the HAVING clause. Exercise 5: How Many Runners Participate in Each Event Exercise: Display the event name and the number of club members that take part in this event (call this column runner_count). Note that there may be events in which no club members participate. For these events, the runner_count should equal 0. Solution: SELECT event.name, COUNT(runner.id) AS runner_count FROM event LEFT JOIN runner_event ON runner_event.event_id = event.id LEFT JOIN runner ON runner_event.runner_id = runner.id GROUP BY event.name; Solution explanation: Here we want to display the event name from the event table and the number of participants from the runner table. The event and runner tables are linked by a many-to-many relation; to join these tables, we also need the runner_event table that relates events and runners. First, we select from the event table. Then, we LEFT JOIN it with the runner_event table, which is further LEFT JOINed with the runner table. Why do we use the LEFT JOIN here? Because we want to ensure that all events (even the ones with no participants) are displayed. We select the event name and the count of all participants; therefore, we need to GROUP BY the event name to get the count of participants per event. Please note that we use COUNT(runner_id) instead of COUNT(*). This is to ensure that we display zero for events with no participants (i.e. for events that do not link to any runner_id). You can read more about different variants of the COUNT() function here. Exercise 6: Group Runners by Main Distance and Age Exercise: Display the distance and the number of runners there are for the following age categories: under 20, 20–29, 30–39, 40–49, and over 50. Use the following column aliases: under_20, age_20_29, age_30_39, age_40_49, and over_50. Solution: SELECT main_distance, COUNT(CASE WHEN age < 20 THEN id END) AS under_20, COUNT(CASE WHEN age >= 20 AND age < 30 THEN id END) AS age_20_29, COUNT(CASE WHEN age >= 30 AND age < 40 THEN id END) AS age_30_39, COUNT(CASE WHEN age >= 40 AND age < 50 THEN id END) AS age_40_49, COUNT(CASE WHEN age >= 50 THEN id END) AS over_50 FROM runner GROUP BY main_distance; Solution explanation: This is similar to Exercise 4 – we want to know the number of runners per distance value. So we select the main_distance column and GROUP BY this column. Then, we use several COUNT() aggregate functions to get the number of runners per distance. However, here we need to further divide the runners according to their age. The CASE WHEN statement comes in handy here, as it can be used to evaluate conditions and return different values based on the results of those conditions. We can pass it as an argument to the COUNT() aggregate function to get the number of runners fulfilling a given condition. Let’s see how that works. CASE WHEN age >= 20 AND age < 30 THEN id END This CASE WHEN statement returns id only when a runner’s age is greater than or equal to 20 and less than 30. Otherwise, it returns NULL. When wrapped in the COUNT() aggregate function, it returns the count of runners fulfilling the condition defined in the CASE WHEN statement. To get the number of runners for each of the five age groups, we need to use as many COUNT() functions and CASE WHEN statements as we have age groups. You can read about counting rows by combining CASE WHEN and GROUP BY here. Section 3: Advanced Window Functions Exercises In the following advanced SQL exercises, we’ll use a Northwind database for an online shop with numerous foods. It contains six tables: customers, orders, products, categories, order_items, and channels. Let's look at the data in this database. The customers table has 15 columns: customer_id stores the ID of the customer. email stores the customer’s email address. full_name stores the customer’s full name. address stores the customer’s street and house number. city stores the city where the customer lives. region stores the customer’s region (not always applicable). postal_code stores the customer’s ZIP/post code. country stores the customer’s country. phone stores the customer’s phone number. registration_date stores the date on which the customer registered. channel_id stores the ID of the channel through which the customer found the shop. first_order_id stores the ID of the first order made by the customer. first_order_date stores the date of the customer’s first order. last_order_id stores the ID of the customer’s last (i.e. most recent) order. last_order_date stores the date of the customer’s last order. The orders table has the following columns: order_id stores the ID of the order. customer_id stores the ID of the customer who placed the order. order_date stores the date when the order was placed. total_amount stores the total amount paid for the order. ship_name stores the name of the person to whom the order was sent. ship_address stores the address (house number and street) where the order was sent. ship_city stores the city where the order was sent. ship_region stores the region in which the city is located. ship_postalcode stores the destination post code. ship_country stores the destination country. shipped_date stores the date when the order was shipped. The products table has the following columns: product_id stores the ID of the product. product_name stores the name of the product. category_id stores the category to which the product belongs. unit_price stores the price for one unit of the product (e.g. per bottle, pack, etc.). discontinued indicates if the product is no longer sold. The categories table has the following columns: category_id stores the ID of the category. category_name stores the name of the category. description stores a short description of the category. The order_items table has the following columns: order_id stores the ID of the order in which the product was bought. product_id stores the ID of the product purchased in the order. unit_price stores the per-unit price of the product. (Note that this can be different from the price in the product’s category; the price can change over time and discounts can be applied.) quantity stores the number of units bought in the order. discount stores the discount applied to the given product. The channels table has the following columns: id stores the ID of the channel. channel_name stores the name of the channel through which the customer found the shop. Let’s do some advanced SQL exercises that focus on window functions. Exercise 7: List the Top 3 Most Expensive Orders Exercise: Create a dense ranking of the orders based on their total_amount. The bigger the amount, the higher the order should be. If two orders have the same total_amount, the older order should go higher (you'll have to add the column order_date to the ordering). Name the ranking column rank. After that, select only the orders with the three highest dense rankings. Show the rank, order_id, and total_amount. Solution: WITH orders_with_ranking AS ( SELECT DENSE_RANK() OVER(ORDER BY total_amount DESC, order_date) AS rank, order_id, total_amount FROM orders ) SELECT * FROM orders_with_ranking WHERE rank <= 3; Solution explanation: Let’s start with the first part of the instruction. We want to create a dense ranking of orders based on their total_amount (the greater the value, the higher the rank) and their order_date value (the older the date, the higher the rank). Please note that the rank value may be duplicated only when total_amount and order_date columns are both equal for more than one row. To do that, we use the DENSE_RANK() window function. In its OVER() clause, we specify the order: descending for total_amount values and ascending for order_date values. We also display the order_id and total_amount columns from the orders table. Until now, we listed all orders along with their dense rank values. But we want to see only the top 3 orders (where the rank column is less than or equal to 3). Let’s analyze the steps we take from here: We define a Common Table Expression (CTE) using this SELECT statement – i.e. we use the WITH clause followed by the CTE’s name and then place the SELECT statement within parentheses. Then we select from this CTE, providing the condition for the rank column in the WHERE clause. You may wonder why we need such a complex syntax that defines a CTE and then queries it. You may say that we could set the condition for the rank column in the WHERE clause of the first SELECT query. Well, that’s not possible because of the SQL query order of execution. We have to use the Common Table Expression here because you can’t use window functions in the WHERE clause. The order of operations in SQL is as follows: FROM, JOIN WHERE GROUP BY Aggregate functions HAVING Window functions SELECT ORDER BY You may only use window functions in SELECT and ORDER BY clauses. If you want to refer to window functions in the WHERE clause, you must place the window function computation in a CTE (like we did in our example) or in a subquery and refer to the window function in the outer query. Follow this article to learn more about CTEs and recursive CTEs. To give you some background on the available ranking functions, there are three functions that let you rank your data: RANK(), DENSE_RANK(), and ROW_NUMBER(). Let’s see them in action. Values to be ranked RANK() DENSE_RANK() ROW_NUMBER() 1 1 1 1 1 1 1 2 1 1 1 3 2 4 2 4 3 5 3 5 3 5 3 6 4 7 4 7 5 8 5 8 The RANK() function assigns the same rank if multiple consecutive rows have the same value. Then, the next row gets the next rank as if the previous rows had distinct values. Here, the ranks 1,1,1 are followed by 4 (as if it was 1,2,3 instead of 1,1,1). The DENSE_RANK() function also assigns the same rank if multiple consecutive rows have the same value. Then, the next row gets the next rank one greater than the previous one. Here, 1,1,1 is followed by 2. The ROW_NUMBER() function assigns consecutive numbers to each next row without considering the row values. Here is an article on how to rank data. You can also learn more about differences between SQL’s ranking functions. Exercise 8: Compute Deltas Between Consecutive Orders Exercise: In this exercise, we're going to compute the difference between two consecutive orders from the same customer. Show the ID of the order (order_id), the ID of the customer (customer_id), the total_amount of the order, the total_amount of the previous order based on the order_date (name the column previous_value), and the difference between the total_amount of the current order and the previous order (name the column delta). Solution: SELECT order_id, customer_id, total_amount, LAG(total_amount) OVER(PARTITION BY customer_id ORDER BY order_date) AS previous_value, total_amount - LAG(total_amount) OVER(PARTITION BY customer_id ORDER BY order_date) AS delta FROM orders; Solution explanation: Here we select the order ID, customer ID, and total amount from the orders table. The LAG() function fetches the previous total_amount value. In the OVER() clause, we define the LAG() function separately for each customer and order the outcome by an order date. Finally, we subtract the value returned by the LAG() function from the total_amount value for each row to get the delta. The previous_value column stores null for the first row, as there are no previous values. Therefore, the delta column is also null for the first row. The following delta column values store the differences between consecutive orders made by the same customer. It is worth mentioning that a delta represents the difference between two values. By calculating the delta between daily sales amounts, we can determine the direction of sales growth/decline on a day-to-day basis. Follow this article to learn more about calculating differences between two rows. And here is how to compute year-over-year differences. Exercise 9: Compute the Running Total of Purchases per Customer Exercise: For each customer and their orders, show the following: customer_id – the ID of the customer. full_name – the full name of the customer. order_id – the ID of the order. order_date – the date of the order. total_amount – the total spent on this order. running_total – the running total spent by the given customer. Sort the rows by customer ID and order date. Solution: SELECT orders.customer_id, customers.full_name, orders.order_id, orders.order_date, orders.total_amount, SUM(orders.total_amount) OVER(PARTITION BY orders.customer_id ORDER BY orders.order_date) AS running_total FROM orders JOIN customers ON orders.customer_id = customers.customer_id ORDER BY orders.customer_id, orders.order_date; Solution explanation: A running total refers to the calculation that accumulates the values of a specific column or expression as rows are processed in a result set. It provides a running sum of the values encountered up to the current row. A running total is calculated by adding the current value to the sum of all previous values. This can be particularly useful in various scenarios, such as tracking cumulative sales, calculating running balances or analyzing cumulative progress over time. Follow this article to learn more about computing a running total. And here is an article about computing running averages. We select customer ID, order ID, order date, and order total from the orders table. Then, we join the orders table with the customers table on their respective customer_id columns so we can display the customer's full name. We use the SUM() window function to calculate the running total for each customer separately (PARTITION BY orders.customer_id) and then order ascendingly by date (ORDER BY orders.order_date). Finally, we order the output of this query by customer ID and order date. Section 4: Advanced Recursive Query Exercises In the following advanced SQL exercises, we’ll use a website database that stores information about students and courses. It contains three tables: student, course, and student_course. Let's look at the data in this database. The student table contains the following columns: id stores the unique ID number for each student. name stores the student's name. email stores the student's email. invited_by_id stores the ID of the student that invited this student to the website. If the student signed up without an invitation, this column will be NULL. The course table consists of the following columns: id stores the unique ID number for each course. name stores the course's name. The student_course table contains the following columns: id stores the unique ID for each row. student_id stores the ID of the student. course_id stores the ID of the course. minutes_spent stores the number of minutes the student spent on the course. is_completed is set to True when the student finishes the course. The exercises in this section have been taken from our Window Functions Practice Set. In this set, you will find more window function exercises on databases that store retail, track competitions, and website traffic. Let’s do some advanced SQL exercises that focus on recursive queries. Exercise 10: Find the Invitation Path for Each Student Exercise: Show the path of invitations for each student (name this column path). For example, if Mary was invited by Alice and Alice wasn't invited by anyone, the path for Mary should look like this: Alice->Mary. Include each student's id, name, and invited_by_id in the results. Solution: WITH RECURSIVE hierarchy AS ( SELECT id, name, invited_by_id, CAST(name AS text) AS path FROM student WHERE invited_by_id IS NULL UNION ALL SELECT student.id, student.name, student.invited_by_id, hierarchy.path || '->' || student.name FROM student, hierarchy WHERE student.invited_by_id = hierarchy.id ) SELECT * FROM hierarchy; Solution explanation: This exercise requires us to create a custom value for the path column that contains the invitation path for each customer. For example, Ann Smith was invited by Veronica Knight, who in turn was invited by Karli Roberson; hence, we get the path column as Karli Roberson->Veronica Knight->Ann Smith for the name Ann Smith. As you may notice, we need a recursion mechanism to dig down into the invitation path. We can write a recursive query by defining it with the WITH RECURSIVE statement, followed by the query name. The content of the hierarchy recursive query is as follows: We select the id, name, and invited_by_id columns from the student table. Then, we use the CAST() function to cast the name column type to the TEXT data type, ensuring smooth concatenation (with -> and the following names) in the main query. The WHERE clause condition ensures that only students who haven’t been invited are listed by this query. The UNION ALL operator combines the result sets of two or more SELECT statements without removing duplicates. Here the queries on which UNION ALL is performed have the same sets of four columns; the result set of one is appended to the results set of another. In the next SELECT statement, we again select the id, name, and invited_by_id columns from the student table. Then, we concatenate the path column (that comes from the hierarchy recursive query as defined in the first SELECT statement) with the -> sign and the student name. To accomplish this concatenation, we select from both the student table and the hierarchy recursive query.(This is where the recursive mechanism comes into play.) In the WHERE clause, we define that the invited_by_id column of the student table is equal to the id column of the hierarchy recursive query, so we get the student name who invited the current student; on the next iteration, we get the name of the student who invited that student, and so on. This is called a recursive query, as it queries itself to work its way down the invitation path. Advancing One Query at a Time The advanced SQL exercises presented in this article provide a comprehensive platform for honing your SQL skills, one query at a time. By delving into window functions, JOINs, GROUP BY, and more, you have expanded your understanding of complex SQL concepts and gained hands-on experience in solving real-world data challenges. Practice is the key to mastering SQL skills. Through consistent practice, you can elevate your proficiency and transform your theoretical knowledge into practical expertise. This article showcased exercises from our courses; you can discover more exercises like this by enrolling in our: Window Functions Practice Set 2021 Monthly SQL Practice Sets - Advanced 2022 Monthly SQL Practice Sets - Advanced Sign up now and get started for free! Good luck! Tags: sql sql practice