14th Sep 2023 20 minutes read 11 SQL Window Functions Exercises with Solutions Martyna Sławińska window functions sql practice Table of Contents Window Functions in a Nutshell SQL Window Functions Practice Exercises: Online Movie Store Dataset Exercise 1: Rank Rentals by Price Solution Show solution Exercise 2: Find 2nd Giftcard-Purchasing Customer Solution Show solution Exercise 3: Calculate Running Total for Payments Solution Show solution SQL Window Functions Practice Exercises: Health Clinic Dataset Exercise 4: Calculate Moving Average for Scores Solution Show solution Exercise 5: Find the Difference Between Procedure Prices Solution Show solution Exercise 6: Find the Difference Between the Current and Best Prices Solution Show solution Exercise 7: Find the Best Doctor per Procedure Solution Show solution SQL Window Functions Practice Exercises: Athletic Championships Dataset Exercise 8: Calculate the Difference Between Daily Wind Speed Averages Solution:Show solution Exercise 9: Compare the Best and Previous Results Solution:Show solution SQL Window Functions Practice Exercises: Website Statistics Dataset Exercise 10: Look Ahead with the LEAD() Function Solution:Show solution Exercise 11: Look Back with the LAG() Function Solution:Show solution More SQL Window Functions Practice In this article, we present 11 practice exercises involving SQL window functions, along with solutions and detailed explanations. SQL window functions are a powerful feature that lets us extract meaningful insights from our data easily, yet few SQL courses offer SQL window functions exercises. This makes practicing window functions quite difficult. In this article, we’ll give you 11 practice exercises that feature window functions. All exercises shown in this article come from our interactive courses Window Functions and Window Functions Practice Set. The Window Functions course is an in-depth tutorial with over 200 exercises. We recommend this course to learn or review SQL window functions knowledge. Window Functions Practice Set is a practice course for those who know SQL window functions and are looking for more practice. Window Functions in a Nutshell SQL window functions are tools that help analyze data in different ways. From computing running totals and moving averages, comparing data within specific subsets, and identifying top performers, to achieving intricate ranking and partitioning, these functions empower us to extract deeper insights from our data – and accomplish complex tasks easily. SQL window functions offer a versatile toolkit for enhanced data analysis. This toolkit includes: ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE() for ranking data. SUM(), AVG(), COUNT(), MAX(), and MIN() for aggregations. LEAD() and LAG() for comparing data across rows. FIRST_VALUE() and LAST_VALUE() for extracting boundary values. Furthermore, the OVER() clause enables precise data partitioning and ordering within these functions, enabling users to perform complex calculations on defined subsets of data. Mastering SQL window functions is becoming increasingly crucial for data professionals, analysts, and engineers. It not only empowers them to efficiently solve complex analytical challenges, it also provides a deeper understanding of data. Practicing SQL window functions will improve your ability to create advanced queries and help you discover new insights from data. This is a vital skill in today's data-focused world. Before tackling the exercises, you may want to take a look at our window functions cheat sheet, which will remind you of the list of window functions and their syntax. SQL Window Functions Practice Exercises: Online Movie Store Before we start the exercises, let’s check out the dataset we’ll be using. Dataset The following exercises use the online movie store database, which contains six tables. The customer table stores information on all registered customers. The columns are id, first_name, last_name, join_date, and country. The movie table contains records of all movies available in the store. The columns are id, title, release_year, genre, and editor_ranking. The review table stores customer ratings of the movies. The columns are id, rating, customer_id (references the customer table), and movie _id (references the movie table). The single_rental table stores information about movies that were rented for a certain period of time by customers. The columns are id, rental_date, rental_period, platform, customer_id (references the customer table), movie _id (references the movie table), payment_date, and payment_amount. The subscription table stores records for all customers who subscribed to the store. The columns are id, length (in days), start_date, platform, payment_date, payment_amount, and customer_id (references the customer table). The giftcard table contains information about purchased gift cards. The columns are id, amount_worth, customer_id (references the customer table), payment_date, and payment_amount. Now that we are familiar with the dataset, let’s proceed to the SQL practice exercises. Exercise 1: Rank Rentals by Price Exercise: For each single rental, show the rental_date, the title of the movie rented, its genre, the payment amount, and the rank of the rental in terms of the price paid (the most expensive rental should have rank = 1). The ranking should be created separately for each movie genre. Allow the same rank for multiple rows and allow gaps in numbering. Solution Show solution SELECT rental_date, title, genre, payment_amount, RANK() OVER(PARTITION BY genre ORDER BY payment_amount DESC) FROM movie JOIN single_rental ON single_rental.movie_id = movie.id; Solution explanation: The instruction tells us to show certain information about single rentals and movies. Thus, we join the single_rental table with the movie table on their common column (i.e. the movie_id column). Next, we need to rank all rentals in terms of the price paid per rental. To do so, we use RANK(). Then, in the OVER() clause, we order the data by the payment_amount column in descending order, so that the most expensive rental has the rank of 1. As the ranking should be created separately for each movie genre, in the OVER() clause, we partition the data by the genre column. Why did we choose RANK() instead of DENSE_RANK() or ROW_NUMBER()? The instruction says that the same rank for multiple rows is allowed; hence, we reduce the options to RANK() and DENSE_RANK(). The ROW_NUMBER() function assigns consecutive numbers as ranks to successive rows; it doesn’t allow multiple rows with the same rank. Gaps in row numbering are allowed, so we need the RANK() function. DENSE_RANK() doesn’t skip any number in a sequence, even if multiple rows have the same rank. The following table presents these ranking functions and how they work given a list of data values: VALUEROW_NUMBER()RANK()DENSE_RANK() Apple111 Apple211 Apple311 Carrot442 Banana553 Banana653 Peach774 Tomato885 Check out this article to learn more about different ranking functions. Exercise 2: Find 2nd Giftcard-Purchasing Customer Exercise: Show the first and last name of the customer who bought the second most-recent gift card, along with the date when the payment took place. Assume that a unique rank is assigned for each gift card purchase. Solution Show solution WITH ranking AS ( SELECT first_name, last_name, payment_date, ROW_NUMBER() OVER(ORDER BY payment_date DESC) AS rank FROM customer JOIN giftcard ON customer.id = giftcard.customer_id ) SELECT first_name, last_name, payment_date FROM ranking WHERE rank = 2; Solution explanation: We are going to show information about customers and their gift card purchases, so we need to join the customer table with the giftcard table on their common column, (customer_id). The instruction says to find the customer who bought the second most-recent gift card. To do that, let's first rank the gift card purchases using the ROW_NUMBER() function; we assume that a unique rank is assigned for each gift card purchase. The inner SELECT statement selects customer information and the dates of their gift card purchases. Then, we rank the rows using the ROW_NUMBER() function to mark the second most-recent gift card purchase (i.e. the rank value of 2). This inner SELECT statement is a Common Table Expression (CTE). It is wrapped inside the WITH clause and is named ranking. We select relevant data from this CTE and provide a condition in the WHERE clause to output only the row with rank equal 2. Why do we need to define a CTE and then query it? Because we cannot use the rank column in the WHERE clause of the inner SELECT. The reason is the order of execution, which is: FROM, JOINs, WHERE, GROUP BY, HAVING, SELECT, DISTINCT, ORDER BY, and LIMIT. So the rank column is not yet defined at the time when the WHERE clause of the inner SELECT would be executed. Exercise 3: Calculate Running Total for Payments Exercise: For each single rental, show the id, rental_date, payment_amount and the running total of payment_amounts of all rentals from the oldest one (in terms of rental_date) to the current row. Solution Show solution SELECT id, rental_date, payment_amount, SUM(payment_amount) OVER( ORDER BY rental_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM single_rental; Solution explanation: First, we select information about each single rental from the single_rental table. Next, we are going to find the running total of payment amounts of all rentals using the SUM() function (which takes the payment_amount column as its argument) with the OVER() clause. Here is an article explaining the details about running total and how to compute it in SQL. The instruction says to find the running total from the oldest rental date until the current row date. So, in the OVER() clause, we need to order the data by the rental_date column and then define ROWS to be counted in the running total, from the oldest date (BETWEEN UNBOUNDED PRECEDING) until the current date (AND CURRENT ROW). SQL Window Functions Practice Exercises: Health Clinic Dataset The following exercises use a health clinic database that contains two tables. The doctor table stores information about doctors. The columns are id, first_name, last_name, and age. The procedure table contains information about procedures performed by doctors on patients. The columns are id, procedure_date, doctor_id (references the doctor table), patient_id, category, name, price, and score. Now that we are familiar with the dataset, let’s proceed to the SQL practice exercises. Exercise 4: Calculate Moving Average for Scores Exercise: For each procedure, show the following information: procedure_date, doctor_id, category, name, score and the average score from the procedures in the same category which are included in the following window frame: the two previous rows, the current row, and the three following rows in terms of the procedure date. Solution Show solution SELECT procedure_date, doctor_id, category, name, score, AVG(score) OVER( PARTITION BY category ORDER BY procedure_date ROWS BETWEEN 2 PRECEDING AND 3 FOLLOWING) FROM procedure; Solution explanation: We are going to show information for each procedure by selecting from the procedure table. Then we’ll find the average score from the procedures in the same category. To do that, we use the AVG() function with the score column as its argument. This is followed by the OVER() clause, where we partition the dataset into categories. Additionally, we must consider only rows included in the following window frame: the two previous rows, the current row, and the three following rows in terms of the procedure date. We define this data frame within the OVER() clause. First, we order the dataset by the procedure_date column so the procedures are listed chronologically. And then we define the rows to be considered for calculating the average score value: two previous rows (BETWEEN 2 PRECEDING) and three following rows (AND 3 FOLLOWING), including the current row. This is known as a moving average. You can learn more in What a Moving Average Is and How to Compute It in SQL. Exercise 5: Find the Difference Between Procedure Prices Exercise: For each procedure, show the following information: id, procedure_date, name, price, price of the previous procedure (in terms of the id) and the difference between these two values. Name the last two columns previous_price and difference. Solution Show solution SELECT id, procedure_date, name, price, LAG(price) OVER(ORDER BY id) AS previous_price, price - LAG(price) OVER(ORDER BY id) AS difference FROM procedure; Solution explanation: Again, we start by selecting information about each procedure from the procedure table. The instruction says to show the price of the previous procedure. To do that, we use the LAG() function, which returns the previous row value for its argument (here, for the price column). To ensure that we choose the previous procedure price in terms of the id, we order the dataset by the id column in the OVER() clause. We alias it as previous_price. Now that we have the price value and the previous price value, we can select the difference between these two values. We simply subtract the LAG() function from the price column and alias it as difference. Check out How to Calculate the Difference Between Two Rows in SQL to learn more. Exercise 6: Find the Difference Between the Current and Best Prices Exercise: For each procedure, show the: procedure_date name price category score Price of the best procedure (in terms of the score) from the same category (column best_procedure). Difference between this price and the best_procedure (column difference). Solution Show solution SELECT procedure_date, name, price, category, score, FIRST_VALUE(price) OVER(PARTITION BY category ORDER BY score DESC) AS best_procedure, price - FIRST_VALUE(price) OVER(PARTITION BY category ORDER BY score DESC) AS difference FROM procedure; Solution explanation: We start by selecting information about each procedure from the procedure table. The next step is to find the price of the best procedure. We use the FIRST_VALUE() function, which returns the first value in an ordered partition of a result set. To get the price of the best procedure from the same category, we must partition the dataset by the category column. And to get the price of the best procedure in terms of score, we must order the dataset by the score column in descending order. We alias this expression as best_procedure. Lastly, we find the difference between price and best_procedure by subtracting the FIRST_VALUE() function from the price column. Exercise 7: Find the Best Doctor per Procedure Exercise: Find out which doctor is the best at each procedure. For each procedure, select the procedure name and the first and last name of all doctors who got high scores (higher than or equal to the average score for this procedure). Rank the doctors per procedure in terms of the number of times they performed this procedure. Then, show the best doctors for each procedure, i.e. those having a rank of 1. Solution Show solution WITH cte AS ( SELECT name, first_name, last_name, COUNT(*) c, RANK() OVER(PARTITION BY name ORDER BY count(*) DESC) AS rank FROM procedure p JOIN doctor d ON p.doctor_id = d.id WHERE score >= (SELECT avg(score) FROM procedure pl WHERE pl.name = p.name) GROUP BY name, first_name, last_name ) SELECT name, first_name, last_name FROM cte WHERE rank = 1; Solution explanation: First, we select the procedure name and information about doctors, so we join the procedure table with the doctor table on their common column (doctor_id). We want to select all doctors who got high scores (higher than or equal to the average score for this procedure). To do that, we define the WHERE clause condition for the score column. The score column must store a value equal to or greater than the average score for the current row’s procedure. Let's rank the doctors per procedure. We'll use the RANK() function with the OVER() clause, where we partition the dataset by the procedure name. Additionally, we must rank in terms of the number of times the doctor performed this procedure. To get the number of times the doctor performed this procedure, we must COUNT(*) while grouping by the procedure name and the first and last name of the doctor (that is, we are grouping by all columns listed in the SELECT statement). All we’ve done until now is define a Common Table Expression (CTE), which is the inner SELECT statement enclosed by the WITH clause and named cte. Now we select the relevant columns from this CTE. To get the best doctors for each procedure (those having a rank of 1), we define the WHERE clause with the condition for the rank column. Why do we need to define a CTE and then query it? Because we cannot use the rank column in the WHERE clause of the inner SELECT. The reason is the order of execution, which is: FROM, JOINs, WHERE, GROUP BY, HAVING, SELECT, DISTINCT, ORDER BY, and LIMIT. The rank column has not been defined when the WHERE clause is executed. SQL Window Functions Practice Exercises: Athletic Championships Dataset The following exercises use the athletic championships database that contains eight tables. The competition table stores information about competitions. The columns are id, name, start_date, end_date, year, and location. The discipline table stores information for all running disciplines (from the short-distance runs (e.g. the 100 meter) to the long-distance runs (e.g. the marathon)). The columns are id, name, is_men, and distance. The event table stores information about the competition and discipline for each event. The columns are id, competition_id (references the competition table), and discipline_id (reference the discipline table). The round table stores the rounds of each event. The columns are id, event_id (references the event table), round_name, round_number, and is_final. The race table stores data for each race of each round. The columns are id, round_id (references the round table), round_name (same as in the round table), race_number, race_date, is_final (same as in the round table), and wind. The athlete table stores information about athletes participating in the competition. The columns are id, first_name, last_name, nationality_id (references the nationality table), and birth_date. The nationality table stores information about athlete’s countries of origin. The columns are id, country_name, and country_abbr. The result table stores information for all participants of a particular event. The columns are race_id (references the race table), athlete_id (references the athlete table), result, place, is_dsq, is_dns, and is_dnf. Now that we are familiar with the dataset, let’s proceed to the SQL practice exercises. Exercise 8: Calculate the Difference Between Daily Wind Speed Averages Exercise: For each date in which there was a race, display the race_date, the average wind on this date rounded to three decimal points, and the difference between the average wind speed on this date and the average wind speed on the date before, also rounded to three decimal points. The columns should be named race_date, avg_wind, and avg_wind_delta. Solution:Show solution SELECT race_date, ROUND(AVG(wind), 3) AS avg_wind, ROUND(AVG(wind) - LAG(AVG(wind)) OVER(ORDER BY race_date), 3) AS avg_wind_delta FROM race GROUP BY race_date; Solution explanation: We are going to display race information for each race date, so we select data from the race table. To find the average wind speed on this date rounded to three decimal points, we use the AVG() function with the wind column as its argument. Then, we enclose it within the ROUND() function and round it to three decimal places. Note that we must group by the race_date column, since we use the AVG() aggregate function. We can get the average wind on the date before by using the LAG() function with the AVG(wind) value as its argument. The OVER() clause defines that we order the entire dataset by the race_date column to have the data rows listed chronologically. Since we want to see the difference between the average wind speed on this date and the average wind speed on the date before, we subtract LAG(AVG(wind)) from AVG(wind). And to round it to three decimal places, we use the ROUND() function again. Exercise 9: Compare the Best and Previous Results Exercise: For each woman who ran in the final round of the women's marathon in Rio, display the following information: The place they achieved in the race. Their first name. Their last name. comparison_to_best – The difference between their time and the best time in this final. comparison_to_previous – The difference between their time and the result for the athlete who got the next-highest Sort the rows by the place column. Solution:Show solution SELECT place, first_name, last_name, result - FIRST_VALUE(result) OVER (ORDER BY result) AS comparison_to_best, result - LAG(result) OVER(ORDER BY result) AS comparison_to_previous FROM competition JOIN event ON competition.id = event.competition_id JOIN discipline ON discipline.id = event.discipline_id JOIN round ON event.id = round.event_id JOIN race ON round.id = race.round_id JOIN result ON result.race_id = race.id JOIN athlete ON athlete.id = result.athlete_id WHERE competition.name = 'Rio de Janeiro Olympic Games' AND discipline.name = 'Women''s Marathon' AND round.is_final IS TRUE ORDER BY place; Solution explanation: We are going to use information about competitions, disciplines, rounds, athletes, and results. Therefore, we must join all these tables on their common columns, as mentioned in the dataset introduction. The instruction says to display information for each woman who ran in the final round of the women's marathon in Rio. We cover it in the WHERE clause that contains the following conditions: The competition name must be Rio de Janeiro Olympic Games. The discipline name must be Women's Marathon. The round must be the final round. Next we select the place column from the result table and the first_name and last_name columns from the athlete table. To find the difference between their time and the best time in this final, we use the FIRST_VALUE() function with the result column as its argument. This is followed by the OVER() clause, which orders the dataset by the result column. Then we subtract this FIRST_VALUE() function from the current row result. We alias it as comparison_to_best. To find the difference between their time and the result for the athlete who got the next-better place, we use the LAG() function with the result column as its argument to get the previous result. Once again, this is followed by the OVER() clause to order the dataset by the result column (to ensure we get the next better result). Then we subtract this LAG() function from the current row result. We alias it as comparison_to_previous. Finally, we sort the rows by the place column using the ORDER BY clause. SQL Window Functions Practice Exercises: Website Statistics Dataset The following exercises use the website statistics database that contains two tables. The website table stores information about websites. The columns are id, name, budget, and opened. The statistics table stores statistics for each The columns are website_id (references the website table), day, users, impressions, clicks, and revenue. Now that we are familiar with the dataset, let’s proceed to the SQL practice exercises. Exercise 10: Look Ahead with the LEAD() Function Exercise: Take the statistics for the website with id = 2 between 1 and 14 May 2016 and show the day, the number of users, and the number of users 7 days later. Note that the last 7 rows don't have a value in the last column. This is because no rows '7 days from now' can be found for them. For these cases, show -1 instead of NULL if no LEAD() value is found. Solution:Show solution SELECT day, users, LEAD(users, 7, -1) OVER(ORDER BY day) FROM statistics WHERE website_id = 2 AND day BETWEEN '2016-05-01' AND '2016-05-14'; Solution explanation: We are going to show the day, the number of users, and the number of users 7 days later. The first two values come from the statistics table – these are the day and users columns. The last value must be calculated using the LEAD() function. We want to see the users column value after seven days; therefore, we pass the users column as the first argument and the value of 7 as the second argument to the LEAD() function. And to ensure that we show -1 instead of NULL if no LEAD() value is found, we pass the third argument as -1. The LEAD() function is followed by the OVER() clause. This clause contains the condition to order the dataset by the day column, as the statistics should be ordered chronologically. To show the statistics for the website with id = 2 between 1 and 14 May 2016, we need to define the relevant conditions in the WHERE clause. Exercise 11: Look Back with the LAG() Function Exercise: Show the statistics for the website with id = 3 that include day, revenue, and the revenue 3 days before. Show -1.00 for rows with no revenue value 3 days before. Solution:Show solution SELECT day, revenue, LAG(revenue, 3, -1.00) OVER(ORDER BY day) FROM statistics WHERE website_id = 3; Solution explanation: We are going to show the day, revenue, and the revenue 3 days before. The first two values come from the statistics table – these are the day and revenue columns. The last value must be calculated using the LAG() function. We want to see the revenue column value from three days before the current row; therefore, we pass the revenue column as the first argument and the value of 3 as the second argument to the LAG() function. And to ensure that we show -1.00 for rows with no revenue value 3 days before, we pass the third argument as -1.00. The LAG() function is followed by the OVER() clause. This contains the condition to order the dataset by the day column, as the statistics should be ordered chronologically. To show the statistics for the website with id = 3, we need to define a condition in the WHERE clause. More SQL Window Functions Practice The SQL window functions practice exercises presented in this article provide a comprehensive platform for honing your SQL and data analysis skills one query at a time. These exercises come from our courses; to find additional practice exercises, visit the courses linked below. Window Functions Window Functions Practice Set If you aim to learn about or refresh your knowledge of window functions, we suggest you start with the Window Functions course, which offers a thorough exploration of this topic. For those seeking to hone their skills in window functions, explore our Window Functions Practice Set. It has 100 exercises structured into three distinct parts, each utilizing a different dataset. Sign up now and get started for free. Good luck! Tags: window functions sql practice