17th Jul 2020 11 minutes read The LAG Function and the LEAD Function in SQL Dorota Wdzięczna sql learn sql window functions Table of Contents The Syntax of the LAG Function The Syntax of the LEAD Function Using LAG() and LEAD() to Compare Values Using LAG() and LEAD() With a Specific Offset Using LAG() and LEAD() With a Default Value Using LAG() and LEAD() With Partitions LAG and LEAD Functions Are Very Useful! LAG() and LEAD() are positional functions. A positional function is a type of window function. If you are not familiar with when and how to use them, what the syntax of each function is, why you might want to use them, and what the differences are, read on! LAG() and LEAD() are positional functions. These are window functions and are very useful in creating reports, because they can refer to data from rows above or below the current row. In this article, we will look at these two in detail. To learn how window functions work, what functions there are, and how to apply them to real-world problems, it’s best to take our Window Functions course. You can find all about the course here. It’s interactive, there are 218 exercises, and you only need a web browser and some basic SQL knowledge. The Syntax of the LAG Function The LAG() function allows access to a value stored in a different row above the current row. The row above may be adjacent or some number of rows above, as sorted by a specified column or set of columns. Let’s look its syntax: LAG(expression [,offset[,default_value]]) OVER(ORDER BY columns) LAG() takes three arguments: the name of the column or an expression from which the value is obtained, the number of rows to skip (offset) above, and the default value to be returned if the stored value obtained from the row above is empty. Only the first argument is required. The third argument (default value) is allowed only if you specify the second argument, the offset. As with other window functions, LAG() requires the OVER clause. It can take optional parameters, which we will explain later. With LAG(), you must specify an ORDER BY in the OVER clause, with a column or a list of columns by which the rows should be sorted. Let’s consider the following table, sale: idseller_namesale_value 3Stef7000 1Alice12000 2Mili25000 And the following query with a LAG() function: SELECT seller_name, sale_value, LAG(sale_value) OVER(ORDER BY sale_value) as previous_sale_value FROM sale; Here is the result: seller_namesale_valueprevious_sale_value Stef7000NULL Alice120007000 Mili2500012000 This simplest use of LAG() displays the value from the adjacent row above. For example, the second record displays Alice’s sale amount ($12,000) with Stef’s ($7,000) from the row above, in columns sale_value and previous_sale_value, respectively. Notice that the first row does not have an adjacent row above, and consequently the previous_sale_value field is empty (NULL), since the row from which the value of sale_value should be obtained does not exist. If you specify only the required argument (the name of the column or other expression) as we have in this example, the offset argument defaults to 1 and the third argument defaults to NULL. In our example, the first row in the result set has NULL in previous_sale_value and in the other rows are the values from the respective rows immediately above, because the offset is 1. Using LAG(), you can see the value for the current row as well as the value from the adjacent row above. You can use this, for example, to check the sale amount of a given row against that of the previous row with the amount of sale sorted from the lowest to the highest. The illustration below shows how the value from the adjacent row above is appended to the current row. The LAG() function is included in our handy “SQL Window Functions Cheat Sheet”. The Syntax of the LEAD Function LEAD() is similar to LAG(). Whereas LAG() accesses a value stored in a row above, LEAD() accesses a value stored in a row below. The syntax of LEAD() is just like that of LAG(): LEAD(expression [,offset[,default_value]]) OVER(ORDER BY columns) Just like LAG(), the LEAD() function takes three arguments: the name of a column or an expression, the offset to be skipped below, and the default value to be returned if the stored value obtained from the row below is empty. Only the first argument is required. The third argument, the default value, can be specified only if you specify the second argument, the offset. Just like LAG(), LEAD() is a window function and requires an OVER clause. And as with LAG(), LEAD() must be accompanied by an ORDER BY in the OVER clause. We again look at the table, sale: idseller_namesale_value 3Stef7000 1Alice12000 2Mili25000 Here’s a query with a LEAD() function: SELECT seller_name, sale_value, LEAD(sale_value) OVER(ORDER BY sale_value) as next_sale_value FROM sale; Here is the result set: seller_namesale_valuenext_sale_value Stef700012000 Alice1200025000 Mili25000NULL The rows are sorted by the column specified in ORDER BY (sale_value). The LEAD() function grabs the sale amount from the row below. For example, Stef’s own sale amount is $7,000 in the column sale_value, and the column next_sale_value in the same record contains $12,000. The latter comes from the sale_value column for Alice, the seller in the next row. Note that the last row does not have a next row, so the next_sale_value field is empty (NULL) for the last row. If you specify only the required argument, that is, only the name of the column or other expression, the offset defaults to 1 and the third argument defaults to NULL. In our example, the value for Alice’s next_sale_value is from the column sale_value of the adjacent row below, since the default offset is 1. Using LEAD(), you can compare values across rows. The following illustration shows how the amount returned by LEAD() is appended to the current row. Using LAG() and LEAD() to Compare Values An important use for LAG() and LEAD() in reports is comparing the values in the current row with the values in the same column but in a row above or below. Consider the following table, annual_sale, shown below: yeartotal_sale 201523000 201625000 201734000 201832000 201933000 As you can see, this table contains the total sale amount by year. Using LAG() and LEAD(), we can compare annual sale amounts across years. Let’s look at this query: SELECT year, total_sale AS current_total_sale, LAG(total_sale) OVER(ORDER BY year) AS previous_total_sale, total_sale - LAG(total_sale) OVER(ORDER BY year) AS difference FROM annual_sale; Here is the result set: yearcurrent_total_saleprevious_total_saledifference 201523000NULLNULL 201625000230002000 201734000250009000 20183200034000-2000 201933000320001000 This query grabs the sale amount from the previous year and puts it into the column previous_total_sale using the LAG() function. The ORDER BY in the OVER clause orders the records by year, ensuring that the adjacent row above represents the previous year. Then it takes the amount from the current_total_sale column in the previous row and brings it over to the current row. This query also calculates the difference in sale amount between the current year and the previous year. This can help us understand if there was an increase (positive difference) or a decrease (negative difference) in sales from one year to the next. For 2015, we have no information about the previous year. Therefore, the value returned by the LAG() function is NULL and so is the difference. Now, the total sale in 2018 was $32,000, but it was $34,000 in 2017 (the previous year) as shown in the column previous_total_sale. The difference is -$2,000, indicating that in 2018, there was a decrease in sales of $2,000 compared to the year 2017. Using LAG() and LEAD() With a Specific Offset You can use LAG() and LEAD() functions with two arguments: the name of the column and the offset. Consider the following table, employee: employee_idyearquarterbonus 120171100 120172250 12017360 12017420 12018180 12018280 1201830 1201840 1201910 120192100 1201930 120194150 The query below selects the bonus for the employee with ID=1 for each quarter of each year. It then identifies the bonuses for the corresponding quarter in the year before and the year after. SELECT year, quarter, LAG(bonus,4) OVER(ORDER BY year,quarter) AS previous_bonus, bonus AS current_bonus, LEAD(bonus,4) OVER(ORDER BY year,quarter) AS next_bonus FROM employee WHERE employee_id=1; This query returns the following result set: yearquarterprevious_bonuscurrent_bonusnext_bonus 20171NULL10080 20172NULL25080 20173NULL600 20174NULL200 20181100800 2018225080100 201836000 20184200150 20191800NULL 2019280100NULL 2019300NULL 201940150NULL The rows highlighted in green are the records for the first quarter of each year, the rows in white the second quarter of each year, etc. In any given row, the previous and the next bonus amounts are taken from the corresponding quarter of the year prior and the year following, they are assigned to columns previous_bonus and next_bonus, respectively. For example, the employee ID=1 received an $80 bonus in the first quarter of 2018. For the same employee, the 2017 first quarter bonus was $100, and the 2019 first quarter bonus was $0. The ORDER BY specifies that the rows be sorted by year and quarter. An offset of 4 tells LEAD() and LAG() to skip 4 rows before and after the current row, respectively. With this offset, you can compare values of the same quarter from different years, because there are 4 quarters in a year. The picture below illustrates this idea. You can find more about LAG() and LEAD() functions in our articles “Common SQL Window Functions: Positional Functions” by Aldo Zelen and “When Do I Use SQL Window Functions?” by Tihomir Babic. Using LAG() and LEAD() With a Default Value In the previous section, we discussed how to use the offset argument in LAG() and LEAD(). Now we consider cases with a third argument: the default value to assign when the value obtained is NULL. To specify this argument, you must also specify the second argument, the offset. The default offset is 1, so specify 1 to keep the default offset or some other value appropriate for your case. Let’s look at another example. The following table, sale_product, contains the product IDs, the month (1 = January, 2 = February, etc.), and the count sold by month. Here are the records for which the product ID is 1. product_idmonthcount 11125 12135 13NULL 1490 The query: SELECT product_id, month, LAG(count,1,0) OVER(ORDER BY month) AS previous_count, count AS current_count, count - LAG(count,1,0) OVER(ORDER BY month) AS difference FROM sale_product WHERE product_id=1; returns the result: product_idmonthprevious_countcurrent_countdifference 110125125 1212513510 13135NULLNULL 14NULL90NULL For the product with ID=1, we select the sale month, the sale count for this month (current_count), and the sale count from the previous month (the value from the previous row returned by LAG()). We would like to show zero instead of NULL when LAG() tries to obtain values from rows beyond those that exist in our data set. For both LAG() and LEAD(), this is done by specifying a third argument, the default value. Remember that the offset argument is required in order to specify the default value argument; here, we specify an offset of 1 to look at the row above. We then specify 0 as the third argument. This sets to zero any attempt to obtain values from rows that do not exist, as is the case here for the first row (there is no row above the first row). Note that the default value of zero is assigned only for rows that do not exist; the rows whose adjacent rows above do exist but with NULLs in current_count are left alone as NULLs instead of changing them to 0. You can see this in the row where month=4: although the current_count for the row above (month=3) is NULL, it is not replaced with a zero, since the previous row does exist and only happens to contain a NULL in current_count. Using LAG() and LEAD() With Partitions Let’s continue with the same example, but now we examine a case in which we need PARTITION BY in the OVER clause. Below is the next part of the table sale_product with another product whose ID=2. product_idmonthcount 11125 12135 13NULL 1490 21150 22100 23185 24190 The query: SELECT product_id, month, LAG(count,1,0) OVER(PARTITION BY product_id ORDER BY month) AS previous_count, count AS current_count, count - LAG(count,1,0) OVER(PARTITION BY product_id ORDER BY month) AS difference FROM sale_product; returns the result: product_idmonthprevious_countcurrent_countdifference 110125125 1212513510 13135NULLNULL 14NULL90NULL 210150150 22150100-50 2310018585 241851905 We have multiple products in this table. To calculate the differences between the current sales and the previous sales separately for each product, we specify PARTITION BY before ORDER BY in the OVER clause. You can name a column or a list of columns in PARTITION BY. Here, we use the product_id column to divide the records into partitions then sort by month within each partition. As a result, every partition starts with Month 1 and ends with Month 4. PARTITION BY is similar to GROUP BY in that it groups the same values together. Unlike GROUP BY, however, PARTITION BY does not collapse the original rows into a single row; the original rows remain accessible. You can read more about the difference between these PARTITION BY and GROUP BY in the article “What Is the Difference Between a GROUP BY and a PARTITION BY?”. LAG and LEAD Functions Are Very Useful! Positional functions such as LAG() and LEAD() are useful in many situations. They are often used in creating reports, because they can refer to the rows above or below, as we have seen from these examples. I hope that this article helps you extend your SQL knowledge of window functions. Read more on positional functions in the articles “Common SQL Window Functions: Positional Functions” by Aldo Zelen and “When Do I Use SQL Window Functions?” by Tihomir Babic. If you are interested in learning more about window functions, try our interactive course “Window Functions” on the LearnSQL.com platform. Tags: sql learn sql window functions