8th Sep 2020 9 minutes read Six Examples Using MySQL Window Functions Dorota Wdzięczna sql learn sql window functions Table of Contents What Is a Window Function? Example 1: The RANK() Function Example 2: The DENSE_RANK() Function Example 3: The ROW_NUMBER() Function Ranking Within Each Partition Example 4: The LEAD() Function Example 5: The LAG() Function Example 6: The Running Total Using Window Functions in MySQL Window functions are an advanced SQL feature available in most popular databases. MySQL had not supported them for a long time, but that changed in Version 8.0. They are helpful not only for analysts and people who create reports, but also for other professionals who use databases to select data needed. In this article, we explain the syntax of some popular window functions with practical examples. What Is a Window Function? Window functions operate on a window frame, or a set of rows that are somehow related to the current row. They are similar to GROUP BY, because they compute aggregate values for a group of rows. However, unlike GROUP BY, they do not collapse rows; instead, they keep the details of individual rows. A very useful feature especially for data analysts, marketers, and financial specialists, most popular relational databases support window functions today. MySQL had not supported them until it finally caught up starting in Version 8.0, when they were added as a new advanced feature. For people who have worked exclusively in MySQL, window functions may be something completely new. If you’re a MySQL specialist who has been using it since before Version 8.0, learn this new feature! Find out why you should in the article “Why Should I Learn SQL Window Functions?”. How do window functions work? Consider the table sale shown below, with columns product ID (ID), price (value), and year of sale (sale_year). Suppose we want to add the column prev_value for each product, containing the value of the previous year. See the picture below, which shows how the window function LAG() works: The query looks like this: SELECT ID, value, sale_year, LAG(value) OVER( PARTITION BY ID ORDER BY year ) AS prev_value FROM sales; The query divides the result set into partitions: one for ID 1 and the other for ID 2. Within each partition, the sale value from the previous year is obtained. Note that the year 2016 for ID 2 is the first row of the partition and therefore has no previous row within the partition; consequently, the prev_value column is NULL. For the same ID, the year 2017 is the second row in the partition; the prev_value is 80.00, which is the value from the year 2016. As you can see, we specify an OVER clause after naming the window function. Within this clause, we use PARTITION BY to define the partitions and ORDER BY to define the sorting within each partition. Other window functions work much in the same way. Some window functions are ranking functions like RANK(), DENSE_RANK(), and ROW_NUMBER(), while others are analytic functions like LAG() and LEAD(). Ranking functions return a rank value for each row within each partition; in contrast, analytic functions point to preceding or subsequent rows within each partition. If no partitions are defined, all rows are treated as one big partition. Let’s look at some examples! Example 1: The RANK() Function The RANK() function is used mainly to create reports. It computes the rank for each row in the result set in the order specified. The ranks are sequential numbers starting from 1. When there are ties (i.e., multiple rows with the same value in the column used to order), these rows are assigned the same rank. In this case, the rank of the next row will have skipped some numbers according to the quantity of the tied rows. For this reason, the values returned by RANK() are not necessarily consecutive numbers. Let’s see an example. Consider the following table, product: idnamecategoryranking_score 1Sofa Alanliving room3422 2Desk Mirianoffice1777 3Sofa Frankliving room1777 4Armchair Ivoliving room1201 5Cabinet AWEoffice4547 6Armchair Alexliving room1201 The query below displays the columns name, category, and ranking_score. It also ranks the rows using the RANK() function in the order defined by the column ranking_score: SELECT RANK() OVER(ORDER BY ranking_score) AS rank_number, name, category, ranking_score FROM product; Here is the result of the query: rank_numbernamecategoryranking_score 1Armchair Ivoliving room1201 1Armchair Alexliving room1201 3Desk Mirianoffice1777 3Sofa Frankliving room1777 5Sofa Alanliving room3422 6Cabinet AWEoffice4547 After the RANK(), we have an OVER() clause with an ORDER BY. The ORDER BY is mandatory for ranking functions. Here, the rows are sorted in ascending order according to the column ranking_score. The order is ascending by default; you may use ASC at the end of the ORDER BY clause to clarify the ascending order, but it is not necessary. The first two products, Armchair Ivo and Armchair Alex, both have ranking_score equal to 1201, so they are tied at rank number 1. The rank for the next value of ranking_score, 1777, is 3 and not 2, because there are 2 rows with the rank 1. The ranks start at 1, but the next rank has skipped according to the repeated ranks of the previous rows. Example 2: The DENSE_RANK() Function DENSE_RANK() is similar to RANK(), but it does not allow gaps in ranks in the way RANK() does. Let’s see an example. SELECT DENSE_RANK() OVER(ORDER BY ranking_score DESC) AS dense_rank_number, name, category, ranking_score FROM product; This query displays the same columns as in the previous example, but with dense_rank_number defined by DENSE_RANK() instead of rank_number defined by RANK(). The rows are still sorted by the column ranking_score, but this time in descending order denoted by DESC at the end of the ORDER BY clause. The Desk Mirian and Sofa Frank have the same ranking_score (1777) and are assigned a dense_rank_number of 3. DENSE_RANK() returns 4 next and doesn't account for the repeated ranks as does the RANK() function. The article “Overview of Ranking Functions in SQL” explains the difference between these functions. The result of this query looks like this: dense_rank_numbe rnamecategoryranking_score 1Cabinet AWEoffice4547 2Sofa Alanliving room3422 3Desk Mirianoffice1777 3Sofa Frankliving room1777 4Armchair Ivoliving room1201 4Armchair Alexliving room1201 Example 3: The ROW_NUMBER() Function Another popular ranking function used in databases is ROW_NUMBER(). It simply assigns consecutive numbers to each row in a specified order. Let’s see an example: SELECT ROW_NUMBER() OVER(ORDER BY ranking_score) AS row_number, name, category, ranking_score FROM product; The query first orders the rows by ranking_score in ascending order. It then assigns row numbers consecutively starting with 1. The rows with ties in ranking_score are assigned different row numbers, effectively ignoring the ties. The result of this query looks like this: row_numbernamecategoryranking_score 1Armchair Ivoliving room1201 2Armchair Alexliving room1201 3Desk Mirianoffice1777 4Sofa Frankliving room1777 5Sofa Alanliving room3422 6Cabinet AWEoffice4547 Ranking Within Each Partition You can assign ranks separately within each partition with RANK(), DENSE_RANK(), or ROW_NUMBER(). This is done by dividing rows into partitions by a column or a combination of columns then assigning ranks independently within each partition. Here is an example: SELECT RANK() OVER(PARTITION BY category ORDER BY ranking_score) AS rank_number, name, category, ranking_score FROM product; This query defines ranks separately within each category. For the category “living room,” the rank starts at 1 and ends at 4; in the “office” category, the rank starts at 1 and ends at 2. Here is the result of this query: rank_numbernamecategoryranking_score 1Armchair Ivoliving room1201 1Armchair Alexliving room1201 3Sofa Frankliving room1777 4Sofa Alanliving room3422 1Desk Mirianoffice1777 2Cabinet AWEoffice4547 The rank numbering is defined separately for each category, since each category is a partition (PARTITION BY category). The records are sorted by ranking_score (ORDER BY ranking_score) within each partition (category), and the ranks are calculated within each partition. Example 4: The LEAD() Function Analytical functions are a different type of window functions. They are used to compute a value within a group of rows, returning some value from a preceding row or from a subsequent row within the row group. Analytical functions are useful for finding differences in a particular column between a given row and some preceding or subsequent row. For example, you can compare the difference of a particular column between the current row and the adjacent row. The first analytic function we will examine is LEAD(). In its simplest form, LEAD() takes the value of a given column stored in the next row. Consider the table below, toys_sale: idtoy_namemonthsale_value 1robot323455 2robot412345 3robot523000 4kite36890 5kite47600 6kite59120 7ball345123 8ball442000 9ball520300 10puzzle567000 and the following query: SELECT toy_name, month, sale_value, LEAD(sale_value) OVER(PARTITION BY toy_name ORDER BY month) AS next_month_value FROM toys_sale; The LEAD() function returns the value of sale_value in the next row. We have partitions in this example—the name of the toy—so only the rows within the same partition are considered. In contrast to ranking functions, LEAD() takes an argument—the name of the column whose value it will return from the next row. Here is the result of the query: toy_namemonthsale_valuenext_month_value ball34512342000 ball44200020300 ball520300NULL kite368907600 kite476009120 kite59120NULL puzzle567000NULL robot32345512345 robot41234523000 robot523000NULL For the last row in the partition, the next value is always NULL. Look at Month 5 for all of the toys. This is because Month 5 is the last month in the partition. There is no month that follows Month 5 in the partition, and the next sale_value does not exist. Example 5: The LAG() Function LAG() is similar to LEAD(). Both LEAD() and LAG() can be used to compute the difference between a given row and another row. The difference is that LAG() returns the value from a preceding row, whereas LEAD() returns the value from a subsequent row. Take a look at the following query: SELECT toy_name, month, sale_value, LAG(sale_value) OVER(PARTITION BY toy_name ORDER BY month) AS prev_month_value, LAG(sale_value) OVER(PARTITION BY toy_name ORDER BY month)- sale_value as difference FROM toys_sale; Each toy name is a partition. Within each partition, the query returns the sale value from the previous month and stores it in the column prev_month_value. For Month 5, prev_month_value contains the sale value from Month 4. It is NULL for Month 3, because there is no previous month within the partition. In addition, this query calculates the difference in sale values between the current month and the previous month and stores the difference in the column difference. Here is the result: toy_namemonthsale_valueprev_month_valuedifference ball345123NULLNULL ball442000451233123 ball5203004200021700 kite36890NULLNULL kite476006890-710 kite591207600-1520 puzzle567000NULLNULL robot323455NULLNULL robot4123452345511110 robot52300012345-10655 Example 6: The Running Total You can also calculate running totals in MySQL using the SUM() function. Take a look at this query: SELECT toy_name, month, sale_value, SUM(sale_value) OVER(PARTITION BY toy_name ORDER BY month) AS total_toy_value FROM toys_sale; It calculates the cumulative sum of the sale values for each toy and saves it into a new column, total_toy_sale. For Month 3, it is simply the sale value of that month, because it is the first row in the partition. However, for Month 4, it is the sum of sale values from Months 3 and 4; for Month 5, it is the sum of sale values from Months 3, 4, and 5. This calculation is called the running total, because the sum is updated with each successive row. toy_namemonthsale_valuetotal_toy_value ball34512345123 ball44200087123 ball520300107423 kite368906890 kite4760014490 kite5912023610 puzzle56700067000 robot32345523455 robot41234535800 robot52300058800 Using Window Functions in MySQL We have seen several examples of window functions. Most relational databases today support them, and MySQL finally caught up in Version 8.0. If you are not familiar with them, you should really consider learning them. They are useful especially for marketers and data analysts. If you are interested in learning more about window functions, our interactive course “Window Functions” and our “SQL Window Functions Cheat Sheet” on LearnSQL.com are good resources. I hope this article has whetted your appetite to search and learn more about SQL window functions! Tags: sql learn sql window functions