28th Nov 2023 22 minutes read What Are SQL Window Functions? Tihomir Babic sql window functions Table of Contents A Brief History of SQL Window Functions What’s a Window? SQL Window Function Syntax Dataset and Syntax Examples Syntax Example #1: OVER () Syntax Example #2: OVER (ORDER BY) Syntax Example #3: OVER (PARTITION BY) Syntax Example #4: OVER (ORDER BY PARTITION BY) What Are the Most Common Window Functions? Real-World SQL Window Function Examples Example #1: Percentage of Total Window Function vs. Aggregate Function vs. GROUP BY Solve This Exercise for Practice Example #2: Rank Data Solve This Exercise for Practice Example #3: Running Total Solve this exercise for practice Example #4: Quarter-To-Quarter Difference Solve This Exercise for Practice SQL Window Functions: A Window to Better Data Analysis Need to up your data analysis game? Learn these SQL window functions here and you’ll take your analysis skills to the next level. Practice exercises and detailed explanations included! When I first heard about SQL window functions, I thought it was some strange marriage between SQL and Windows. I was wrong. The SQL window functions have nothing to do with that famous operating system. They are SQL functions that do computations on the set of rows related to the current row. This set of rows is called a window or window frame – hence the function name. You might also hear about windowing functions in SQL, analytical functions, or the OVER() functions. These are all just alternative names for SQL window functions – an extremely helpful set of tools for data analysis. In this article, we’ll show you what you can do with window functions and how. I’ll start with a short history of SQL window functions and explain why they are called that. Then, I’ll guide you through the syntax and show you how it works with several examples. After practicing syntax, we’ll be ready for real-life window function examples from a data analyst’s life. And here’s the juiciest part: after each example, there’s an exercise for you to solve and learn through coding. However, the main source of your knowledge on this subject should be our Window Functions course. Its 218 interactive exercises cover SQL window functions in detail. In other words, you’ll learn about window frames and the OVER(), PARTITION BY, and ORDER BY clauses. All this is necessary to aggregate, rank, and analyze data using window functions. A Brief History of SQL Window Functions Window functions were first introduced in the Oracle8i Database, which was released in 1998. However, they were included in the SQL standard five years later with SQL:2003. Then Microsoft included them in SQL Server 2005. Other database management systems (DBMS) followed; PostgreSQL has supported them since PostgreSQL 8.4 was released in 2009; MariaDB included them with the 10.2 version (2016), and MySQL added them to version 8 in 2018. Window functions are a rather new feature in SQL. Because of that, they are not part of the usual SQL curriculum. By learning them, you’ll be ahead of the curve compared to many SQL users. What’s a Window? A set of rows related to the current row is called a window or a window frame. Hence, the name of these functions: their result is based on a sliding window frame. For example, you can calculate a cumulative sum as shown below: datesalescumulative_sum 2023-10-014,2414,241 2023-10-022,3896,630 2023-10-031,5808,210 2023-10-043,39511,605 2023-10-051,26512,870 The window for the 2023-10-04 cumulative sum is highlighted in green. It includes the current row (for 2023-10-04) and all the previous rows. So the cumulative sum is calculated as the sum of all the previous and current sales: 4,241 + 2,389 + 1,580 + 3,395 = 11,605. (Note that the row outlined in red dots is not included in the window or the sum.) When we move to the next row, the window also moves: it will now include all the previous rows (green) and the current row (red dotted). Now the cumulative sum is 4,241 + 2,389 + 1,580 + 3,395 + 1,265 = 12,870. So the window is the set of rows related to the current row that are used in computations for this row. The window changes (slides) as we move across the rows; thanks to these images of a sliding window, we get the name of these functions. SQL Window Function Syntax The syntax for window functions is: SELECT column_1, column_2, <window_function> OVER(PARTITION BY … ORDER BY … ) AS column_alias FROM table; Here’s what each part does: <window_function> – Specifies the function to apply to that window. OVER() – Defines the window (set of rows) and indicates that this is a window function; without this clause, it’s not a window function. <window_frame> – Defines the window frame size (optional). PARTITION BY – Divides the window into smaller groups called partitions (optional); if omitted, the whole result set is one partition. ORDER BY – Sorts rows within the window frame (optional), i.e., decides in which order the window operation will be performed; if omitted, the order of rows within the partition is arbitrary. Additional clauses can further define the window. Their syntax is: [<ROWS or RANGE clause> BETWEEN <lower_bound> AND <upper_bound>] The ROWS clause defines the window in terms of the fixed number of rows in relation to the current row. The RANGE clause does the same. But it also takes into calculation all the rows with the same values in the columns specified in the ORDER BY clause as the current row. The window bounds can be defined as UNBOUNDED PRECEDING – All the rows before the current row. n PRECEDING – A defined number of rows before the current row. CURRENT ROW – Includes the current row. n FOLLOWING – A defined number of rows after the current row. UNBOUNDED FOLLOWING – All the rows after the current row. Let’s now see how this works in practice. Dataset and Syntax Examples We’ll use the table album_catalogue in all these examples. You can create it yourself using this script. A data snapshot is shown below: idalbum_titlealbum_lengthalbum_genreartistcopies_soldsales_period 1Wednesday Morning, 3 A.M0:31:38FolkSimon & Garfunkel10432022_1Q 2EnRoute: John Scofield Trio LIVE1:13:48JazzJohn Scofield Trio5122022_1Q 3Nasty Gal0:39:15FunkBetty Davis8092022_1Q 4The New Folk Sound of Terry Callier0:37:41FolkTerry Callier9032022_1Q 5In a Silent Way0:38:08JazzMiles Davis4282022_1Q The dataset is a list of albums with their length, genre, artist, and sales data, including the number of copies sold and the period (quarters). The data goes all the way to the third quarter of 2023. I’ll first show you several examples, explaining each crucial part of the windowing functions’ syntax along the way. Syntax Example #1: OVER () You can use the SUM() window function with only the OVER() clause to get the total sales in the fourth quarter of 2022: SELECT sales_period, album_title, artist, copies_sold, SUM (copies_sold) OVER() AS sold_in_4Q_2022 FROM album_catalogue WHERE sales_period = '2022_4Q'; I want the sum of the copies sold, so I specify this column in SUM(). The OVER() clause is mandatory. If you want to use OVER() without any of the optional clauses, just leave the parentheses empty. I use WHERE to output only data from the desired quarter. When you write a query like this – with an empty OVER() – the whole result set (selected columns, applied filters, etc.) is taken into account when performing the window function calculations. Here, the result shows the individual sales of every album sold in the fourth quarter of 2022. It also shows the total sales of all albums sold in that period. sales_periodalbum_titleartistcopies_soldsold_in_4q_2022 2022_4QWednesday Morning, 3 A.MSimon & Garfunkel8097,403 2022_4QEnRoute: John Scofield Trio LIVEJohn Scofield Trio6127,403 2022_4QNasty GalBetty Davis3697,403 2022_4QThe New Folk Sound of Terry CallierTerry Callier2147,403 2022_4QIn a Silent WayMiles Davis657,403 2022_4QCold SweatJames Brown2097,403 2022_4QThe Freewheelin' Bob DylanBob Dylan2467,403 2022_4QMy Favorite ThingsJohn Coltrane3777,403 2022_4QA Whole New ThingSly and the Family Stone8167,403 2022_4QFive Leaves LeftNick Drake4007,403 2022_4QHead HuntersHerbie Hancock4097,403 2022_4QIn the Right PlaceDr. John9127,403 2022_4QBlueJoni Mitchell4127,403 2022_4QConciertoJim Hall6127,403 2022_4QDirty MindPrince9417,403 With the help of SUM() and OVER(), I’m able to show each individual album’s sales and quarterly total. Syntax Example #2: OVER (ORDER BY) You can add additional clauses inside the OVER() clause to change the definition of the window frame. One such clause is ORDER BY. The ORDER BY clause defines the sorting of rows within a window frame: the rows can be processed by the window function in a given order. Let’s see an example. You can calculate the cumulative sum by adding ORDER BY to the previous query. In this example, I want to see how the album ‘In the Right Place’ sells over time and the cumulative number of albums sold up to a given period of time. Here’s the query: SELECT sales_period, album_title, artist, copies_sold, SUM (copies_sold) OVER(ORDER BY sales_period ASC) AS cumulative_sum FROM album_catalogue WHERE album_title = 'In the Right Place'; The same column copies_sold is in SUM(). This time, OVER() contains the ORDER BY clause. You want to show the cumulative sales from the earliest to the latest quarter. That’s why you need sales_period and ASC in ORDER BY. sales_periodalbum_titleartistcopies_soldcumulative_sum 2022_1QIn the Right PlaceDr. John222222 2022_2QIn the Right PlaceDr. John208430 2022_3QIn the Right PlaceDr. John94524 2022_4QIn the Right PlaceDr. John9121436 2023_1QIn the Right PlaceDr. John9122348 2023_2QIn the Right PlaceDr. John562404 2023_3QIn the Right PlaceDr. John5622966 In each row, you can see the sales for each quarter and the cumulative sum, i.e., the sum of the current and all the previous quarters. For instance, the album sold 94 copies in the third quarter of 2022. The total sales in 2022 up until then (or in three quarters) is: 222 + 208 + 94 = 524. Syntax Example #3: OVER (PARTITION BY) Another clause you can use in OVER() is PARTITION BY. PARTITION BY is used to divide the window into smaller segments based on some criteria. For example, you can list the albums, their sales data for the fourth quarter of 2022, and the sales by genre for that quarter: SELECT album_title, artist, copies_sold, album_genre, SUM (copies_sold) OVER(PARTITION BY album_genre) AS sales_by_genre FROM album_catalogue WHERE sales_period = '2022_4Q'; Again, we use the same window function SUM(). This time, though, we use PARTITION BY to divide the window into smaller segments based on album genre. Everything else stays the same. The query returns the result below. It’s an analysis of album sales data by genre for the last quarter of 2022. album_titleartistcopies_soldalbum_genresales_by_genre Wednesday Morning, 3 A.MSimon & Garfunkel809Folk2,081 The Freewheelin' Bob DylanBob Dylan246Folk2,081 Five Leaves LeftNick Drake400Folk2,081 The New Folk Sound of Terry CallierTerry Callier214Folk2,081 BlueJoni Mitchell412Folk2,081 Dirty MindPrince941Funk3,247 Nasty GalBetty Davis369Funk3,247 Cold SweatJames Brown209Funk3,247 A Whole New ThingSly and the Family Stone816Funk3,247 In the Right PlaceDr. John912Funk3,247 Head HuntersHerbie Hancock409Jazz2,075 EnRoute: John Scofield Trio LIVEJohn Scofield Trio612Jazz2,075 In a Silent WayMiles Davis65Jazz2,075 ConciertoJim Hall612Jazz2,075 My Favorite ThingsJohn Coltrane377Jazz2,075 For instance, the cumulative sum for the folk albums is 809 + 246 + 400 + 214 + 412 = 2,081. Syntax Example #4: OVER (ORDER BY PARTITION BY) You can also use both PARTITION BY and ORDER BY in OVER(). The rows are divided into segments with PARTITION BY and processed in a given order by ORDER BY. Using the query below, I can show all the album analytical data and calculate the cumulative sum for each album separately: SELECT sales_period, album_title, artist, copies_sold, SUM (copies_sold) OVER(PARTITION BY album_title ORDER BY sales_period ASC) AS cumulative_sum_by_album FROM album_catalogue; I calculate this with the help of the SUM() window function, as I did earlier. I partitioned the window by album. This means that the sum will be cumulated until the function reaches the last row of a particular album. When it reaches another album, it resets and starts accumulating the sum from the beginning. I also use ORDER BY to instruct the function to cumulate the sum from the earliest to the latest quarter. sales_periodalbum_titleartistcopies_soldcumulative_sum_by_album 2022_1QA Whole New ThingSly and the Family Stone674674 2022_2QA Whole New ThingSly and the Family Stone257931 2022_3QA Whole New ThingSly and the Family Stone6661,597 2022_4QA Whole New ThingSly and the Family Stone8162,413 2023_1QA Whole New ThingSly and the Family Stone8163,229 2023_2QA Whole New ThingSly and the Family Stone3023,531 2023_3QA Whole New ThingSly and the Family Stone1233,654 2022_1QBlueJoni Mitchell589589 2022_2QBlueJoni Mitchell184773 2022_3QBlueJoni Mitchell2561,029 2022_4QBlueJoni Mitchell4121,441 2023_1QBlueJoni Mitchell4121,853 2023_2QBlueJoni Mitchell991,952 2023_3QBlueJoni Mitchell9952,947 …………… 2022_1QWednesday Morning, 3 A.MSimon & Garfunkel1,0431,043 2022_2QWednesday Morning, 3 A.MSimon & Garfunkel4371,480 2022_3QWednesday Morning, 3 A.MSimon & Garfunkel1841,664 2022_4QWednesday Morning, 3 A.MSimon & Garfunkel8092,473 2023_1QWednesday Morning, 3 A.MSimon & Garfunkel8093,282 2023_2QWednesday Morning, 3 A.MSimon & Garfunkel3253,607 2023_3QWednesday Morning, 3 A.MSimon & Garfunkel6124,219 You can see that the cumulative sum for ‘A Whole New Thing’ is 3,654. The next album (‘Blue’) starts with the next row, so the cumulation is reset: the cumulative sum is the same as the individual sales of the album in the first quarter of 2022. Then, it accumulates until it reaches the next album. The result goes all the way until the last album, which is ‘Wednesday Morning, 3 A.M.’ in our case. I’ve shown you the most common ways of defining window frames with these examples. But these are not the only ways. You can also use the ROW or RANGE clauses with the syntax and bounds we explained earlier. Don’t worry. You’ll see the practical use of this in real-world examples. I used only one function, SUM(), throughout these examples. This is one of many window functions; let’s quickly review some of the others. What Are the Most Common Window Functions? The most common window functions can be divided into three categories: Aggregate Window Functions: COUNT() – Counts the number of rows within a window. SUM() – Totals given values within a window. AVG() – Calculate the average of given values within a window. MIN() – Finds the smallest value within a window. MAX() – Finds the largest value within a window. Ranking Window Functions: ROW_NUMBER() – Ranks values sequentially, with different ranks for the tied values. RANK() – Ranks values using the same rank for tied values; skips the next rank after the ties (e.g. 1, 2, 2, 4). DENSE_RANK() – Ranks values using the same rank for tied values; doesn’t skip the next rank after the ties (e.g. 1,2,2,3,4). Analytic Window Functions: LEAD() – Gets data from a defined offset (i.e. a stated number of rows) after the current row. LAG() – Gets data from a defined offset (i.e. a stated number of rows) before the current row. There are more window functions you could find useful. Check them out in our free SQL Window Functions Cheat Sheet. Real-World SQL Window Function Examples So far, I’ve focused more on the syntax of SQL window functions. I’ll now show you the most common practical uses of window functions and how they can help data analysts in their work. These examples will use the same dataset as earlier. Example #1: Percentage of Total Let’s show info about each album and its sales in the first quarter of 2023. Additionally, we will show quarterly sales by each genre. Then, let’s calculate how much each album (as a percentage) contributes to genre sales. SELECT album_title, artist, copies_sold, album_genre, SUM(copies_sold) OVER (PARTITION BY album_genre) AS sales_by_genre, (copies_sold*1.0/ SUM(copies_sold) OVER (PARTITION BY album_genre))*100.0 AS percent_of_genre_sales FROM album_catalogue WHERE sales_period = '2023_1Q' ORDER BY album_genre, copies_sold DESC; To get the sales by genre, I again use the SUM() window function. In the OVER() clause, I use only PARTITION BY. That way, I can partition the window by the album genre. In the next code line, I divide the copies sold (of each album) and divide by the genre sales. To do that, simply copy the calculation from the previous line. Then, multiply the quotient by 100 to get the percentage. You’ll notice that I also multiplied copies_sold with 1.0. This is for converting integers to decimal values. Filter the desired quarter using WHERE. Finally, order the output alphabetically by genre and then descendingly by copies sold. Here’s the result: album_titleartistcopies_soldalbum_genresales_by_genrepercent_of_genre_sales Wednesday Morning, 3 A.MSimon & Garfunkel809Folk2,08138.88 BlueJoni Mitchell412Folk2,08119.80 Five Leaves LeftNick Drake400Folk2,08119.22 The Freewheelin' Bob DylanBob Dylan246Folk2,08111.82 The New Folk Sound of Terry CallierTerry Callier214Folk2,08110.28 Dirty MindPrince941Funk3,24728.98 In the Right PlaceDr. John912Funk3,24728.09 A Whole New ThingSly and the Family Stone816Funk3,24725.13 Nasty GalBetty Davis369Funk3,24711.36 Cold SweatJames Brown209Funk3,2476.44 EnRoute: John Scofield Trio LIVEJohn Scofield Trio612Jazz2,07529.49 ConciertoJim Hall612Jazz2,07529.49 Head HuntersHerbie Hancock409Jazz2,07519.71 My Favorite ThingsJohn Coltrane377Jazz2,07518.17 In a Silent WayMiles Davis65Jazz2,0753.13 Let’s check the calculation for the first row. The album by Simon & Garfunkel sold 809 copies. The total folk album sales for that quarter were 2,081. So the percentage of the individual sales in total genre sales is 809/2,081*100 = 38.88%. The sum of percentages for each genre should be 100%. Let’s check this on a folk genre: 38.88% + 19.80% + 19.22% + 11.82% + 10.28 % = 100%. Window Function vs. Aggregate Function vs. GROUP BY I’m again using the aggregate function as a window function. I could’ve used a simple aggregate SUM() function with GROUP BY to get sales by each genre for the specified quarter. What’s the difference, then? A window function allows you to show both analytical and aggregate data (individual sales with sales by genre and the quotient of these values), while an aggregate function used with GROUP BY would collapse the individual row and show only the aggregate value (the sum of sales for the quarter). Solve This Exercise for Practice Using window functions, rewrite the above query so that it shows the average sale by genre. Also, show how much each album's sales are above or below the genre average (as a percentage). Show only sales from the third quarter of 2023. Show the album title, artist, copies sold, and the album genre. Sort the output ascendingly by genre and individual album sales. Solution: SELECT album_title, artist, copies_sold, album_genre, AVG(copies_sold) OVER (PARTITION BY album_genre) AS average_sales_by_genre, ((copies_sold/AVG(copies_sold) OVER (PARTITION BY album_genre))-1)*100 AS pct_from_average FROM album_catalogue WHERE sales_period = '2023_3Q' ORDER BY album_genre, copies_sold; Output: album_titleartistcopies_soldalbum_genreaverage_sales_by_genrepct_from_average The New Folk Sound of Terry CallierTerry Callier283Folk561.6-49.61 Five Leaves LeftNick Drake321Folk561.6-42.84 The Freewheelin' Bob DylanBob Dylan597Folk561.66.30 Wednesday Morning, 3 A.MSimon & Garfunkel612Folk561.68.97 BlueJoni Mitchell995Folk561.677.17 A Whole New ThingSly and the Family Stone123Funk533.4-76.94 Dirty MindPrince169Funk533.4-68.32 In the Right PlaceDr. John562Funk533.45.36 Nasty GalBetty Davis808Funk533.451.48 Cold SweatJames Brown1005Funk533.488.41 ConciertoJim Hall263Jazz464-43.32 My Favorite ThingsJohn Coltrane302Jazz464-34.91 EnRoute: John Scofield Trio LIVEJohn Scofield Trio404Jazz464-12.93 Head HuntersHerbie Hancock542Jazz46416.81 In a Silent WayMiles Davis809Jazz46474.35 Example #2: Rank Data In this example, I’ll use a window function to rank data. I want to show each distinct album title and its length and rank them by length. The longest album will be ranked first. SELECT *, RANK() OVER (ORDER BY album_length DESC) AS album_length_rank FROM (SELECT DISTINCT album_title, album_length FROM album_catalogue) AS distinct_album; Let’s start by explaining the subquery: we use it to select distinct albums and their lengths. Then, we use the main query to select all the data from the subquery. Now, use the RANK() window function to rank albums. You can also use other ranking functions, depending on your data and tasks. For the ranking to work the way you want to, use the ORDER BY clause in OVER(). Specify the column by which you want to rank and in what order. In this case, it’s descendingly by length. Here’s the ranking: album_titlealbum_lengthalbum_length_rank EnRoute: John Scofield Trio LIVE1:13:481 The Freewheelin' Bob Dylan0:44:142 Head Hunters0:41:523 Five Leaves Left0:41:434 My Favorite Things0:40:255 Nasty Gal0:39:156 In a Silent Way0:38:087 Concierto0:38:028 A Whole New Thing0:38:019 The New Folk Sound of Terry Callier0:37:4110 Blue0:36:1511 Cold Sweat0:33:4312 In the Right Place0:33:2213 Wednesday Morning, 3 A.M0:31:3814 Dirty Mind0:30:1415 Solve This Exercise for Practice Rank each unique album by its sales within its genre. Show only data for the first quarter of 2023. Show the album title, its sales, genre, and rank. If there are albums with the same number of sales, rank them equally and don't skip the next rank. Solution: SELECT *, DENSE_RANK() OVER (PARTITION BY album_genre ORDER BY copies_sold DESC) AS album_sales_rank FROM (SELECT DISTINCT album_title, copies_sold, album_genre FROM album_catalogue WHERE sales_period = '2023_1Q') AS distinct_album; Output: album_titlecopies_soldalbum_genrealbum_sales_rank Wednesday Morning, 3 A.M809Folk1 Blue412Folk2 Five Leaves Left400Folk3 The Freewheelin' Bob Dylan246Folk4 The New Folk Sound of Terry Callier214Folk5 Dirty Mind941Funk1 In the Right Place912Funk2 A Whole New Thing816Funk3 Nasty Gal369Funk4 Cold Sweat209Funk5 EnRoute: John Scofield Trio LIVE612Jazz1 Concierto612Jazz1 Head Hunters409Jazz2 My Favorite Things377Jazz3 In a Silent Way65Jazz4 Example #3: Running Total In this example, I’ll show a particular album’s sales period, title, artist, and the copies sold. I’ll also add a running total of copies sold that will include three rows: the current row and the two previous. The sum should be calculated from the earliest to the latest quarter. SELECT sales_period, album_title, artist, copies_sold, SUM(copies_sold) OVER (ORDER BY sales_period ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS sales_running_total FROM album_catalogue WHERE album_title = 'In a Silent Way'; I again use the SUM() window function. Then there’s an ORDER BY in OVER() to order the sales so we can sum them ascendingly. Next, I need to define the moving window frame. The running total should include the current row and the previous two rows. These are the lower and upper bounds specified in the ROWS clause. The lower bound is two previous rows, i.e., BETWEEN 2 PRECEDING. The upper bound is CURRENT ROW. The two bounds are stitched together in a window frame using the keyword AND. I want to show the calculation for Miles Davis’s album ‘In a Silent Way’, so I filter data using WHERE. Here are the running totals: sales_periodalbum_titleartistcopies_soldsales_running_total 2022_1QIn a Silent WayMiles Davis428428 2022_2QIn a Silent WayMiles Davis1,0531,481 2022_3QIn a Silent WayMiles Davis191,500 2022_4QIn a Silent WayMiles Davis651,137 2023_1QIn a Silent WayMiles Davis65149 2023_2QIn a Silent WayMiles Davis218348 2023_3QIn a Silent WayMiles Davis8091,092 Let’s check the result and explain what a running total is. A running total is similar to a cumulative total (or sum), but they’re not the same. The cumulative total will give you the sum of the current row and all the previous rows, i.e., the window frame increases with each row. A running total is a sum within a defined window frame that stays the same size but moves with each row. In our case, the window is defined as the current row and the two previous rows. Take a look at the highlighted values. The running total for the first quarter in 2022 is 428, the same as the individual sale. There are no previous rows, so the running total includes only the current row. The next running total is 428 + 1,053 = 1,481. It sums the current and the previous row, as there is only one previous row. The running total for the third quarter in 2022 is 428 + 1,053 + 19 = 1,500. This is the first time you get the whole window, i.e., the current row and the two previous rows. As you go to the next row, the window will move but its size will remain the same. The running total for the following quarter is 428 + 1,053 + 19 + 65 = 1,137. It, again, involves the current row and the two previous rows – but different ones compared to the quarter. Solve this exercise for practice Rewrite the above query so it calculates the running total for the album ‘The New Folk Sound of Terry Callier’. The running total should be calculated from the earliest to the latest quarter. It should include four quarters: the two previous, the current quarter, and the one following. Also, show the sales period, album title, artist, and the number of copies sold. Solution: SELECT sales_period, album_title, artist, copies_sold, SUM(copies_sold) OVER (ORDER BY sales_period ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS sales_running_total FROM album_catalogue WHERE album_title = 'The New Folk Sound of Terry Callier'; Output: sales_periodalbum_titleartistcopies_soldsales_running_total 2022_1QThe New Folk Sound of Terry CallierTerry Callier9032,575 2022_2QThe New Folk Sound of Terry CallierTerry Callier4182,789 2022_3QThe New Folk Sound of Terry CallierTerry Callier1,2543,003 2022_4QThe New Folk Sound of Terry CallierTerry Callier2142,641 2023_1QThe New Folk Sound of Terry CallierTerry Callier2142,506 2023_2QThe New Folk Sound of Terry CallierTerry Callier5411,252 2023_3QThe New Folk Sound of Terry CallierTerry Callier2831,038 Example #4: Quarter-To-Quarter Difference In this last example, I’ll show how to use the window functions to calculate the sales difference between quarters: SELECT *, LAG(quarterly_copies_sold) OVER (ORDER BY sales_period) AS previous_quarter_sales, quarterly_copies_sold - LAG(quarterly_copies_sold) OVER (ORDER BY sales_period) AS quarterly_sales_difference FROM (SELECT sales_period, SUM(copies_sold) AS quarterly_copies_sold FROM album_catalogue GROUP BY sales_period) AS quarterly_sales; I first write a subquery that calculates the total sales for each quarter. I use the SUM() aggregate function and group the results by the sales period. Next, I select all the data from the subquery in the main query. Now I need to get the previous quarter's sales. I’ll write the LAG() window function, which is used to access values from the previous rows. The value I want to access is specified in the function. In this case, it’s the quarterly copies sold from the subquery. By defining the offset argument, the function allows me to define how far back I want to go. I didn’t define it, so the default offset is one. In other words, the function will get the data from the previous row/quarter. But if you want to go two rows/quarters back, then you would write LAG(quarterly_copies_sold, 2). I also use ORDER BY in OVER() to make sure the values within the frame are sorted from the oldest to the latest quarter. This use of the window function is so that it’s clearer what I’ll do in the next code line. This is where the actual calculation of comparing the current and previous quarter’s sales is done. It’s simple now: subtract the window function defined above from the column quarterly_copies_sold. Here’s the output: sales_periodquarterly_copies_soldprevious_quarter_salesquarterly_sales_difference 2022_1Q9,519NULLNULL 2022_2Q7,5819,519-1,938 2022_3Q4,2737,581-3,308 2022_4Q7,4034,2733,130 2023_1Q7,4037,4030 2023_2Q4,9567,403-2,447 2023_3Q7,7954,9562,839 There are no previous values for 2022_1Q, as there is no previous quarter. The quarterly sales for 2022_2Q are 7,581. The sales in the previous quarter were 9,519. The calculation shows that the current sales are 1,938 copies (7,581 - 9,519) below the previous quarter's sales. You can analyze the rest of the output the same way. Solve This Exercise for Practice Rewrite the above query so it shows the difference between quarterly sales on a year-over-year basis – e.g., compare the first quarter of 2023 with the first quarter of 2022. Show the sales period, copies sold in the quarter, sales for the same quarter in the previous year, and the year-over-year difference between the quarters. Solution: SELECT *, LAG(quarterly_copies_sold, 4) OVER (ORDER BY sales_period) AS year_over_year_sales, quarterly_copies_sold - LAG(quarterly_copies_sold, 4) OVER (ORDER BY sales_period) AS year_over_year_difference FROM (SELECT sales_period, SUM(copies_sold) AS quarterly_copies_sold FROM album_catalogue GROUP BY sales_period) AS quarterly_sales; Output: sales_periodquarterly_copies_soldyear_over_year_salesyear_over_year_difference 2022_1Q9,519NULLNULL 2022_2Q7,581NULLNULL 2022_3Q4,273NULLNULL 2022_4Q7,403NULLNULL 2023_1Q7,4039,519-2,116 2023_2Q4,9567,581-2,625 2023_3Q7,7954,2733,522 If you want more, here are additional window function examples. For practice materials, take a look at these 11 SQL window functions exercises. SQL Window Functions: A Window to Better Data Analysis This turned out to be a pretty comprehensive article about SQL window functions. You learned the window functions and how each crucial part of their syntax works. You also know there are several categories of window functions. The most commonly used are aggregate, ranking, and analytical window functions. The practical examples showed you how window functions can be used in common data analysis tasks. Hopefully, you didn’t skip the exercises in the article. If you did, I once again recommend that you solve them. Only through practice can you really bring home what SQL window functions are about. The richest resource for learning and practicing is our Window Functions course. It is an interactive course that has over 200 hands-on exercises and covers the full syntax of window functions. If you have job interviews lined up, make sure you go through these SQL window functions interview questions. Good luck, and keep on learning SQL! Tags: sql window functions