31st Aug 2017 9 minutes read Common SQL Window Functions: Positional Functions Aldo Zelen aggregate functions Advanced SQL window functions Table of Contents Introducing SQL's Positional Windows Functions FIRST VALUE LAST VALUE LAG Step 1 Step 2 LEAD Conclusion Positional SQL window functions deal with data's location in the set. In this post, we explain LEAD, LAG, and other positional functions. SQL window functions allow us to aggregate data while still using individual row values. We've already dealt with ranking functions and the use of partitions. In this post, we'll examine positional window functions, which are extremely helpful in reporting and summarizing data. Specifically, we'll look at LAG, LEAD, FIRST_VALUE and LAST_VALUE. It is worthwhile mentioning that LEAD mirrors LAG and FIRST_VALUE mirrors LAST_VALUE. In some ways, these pairs can be used interchangeably. We're getting into a more complex area of SQL window functions now. As we go through this post, you'll notice that some problems are solved more easily using window functions (as compared to using nested complex subquery combinations). Don't worry if you feel a little bit overwhelmed at first. The topic isn't easy and it takes time to master it. Just keep learning and practicing! Introducing SQL's Positional Windows Functions Once again, we'll be working with the sales table. This table tracks basic sales data using the following columns: id" – The primary key of the table. salesman_id – The ID of the person who made the sale. sales_item – The ID of the item sold. sales_num – The number of items sold. sales_price – The price per individual item. datetime – The date of the sale. Now, let's start with the FIRST VALUE function. FIRST VALUE First, we are going to look at the analytical function called FIRST VALUE. As the name suggests, this functions returns the first value in an ordered list of values. Suppose we want to extract the first six months of data from our sales table, but we want to list the salesman with the biggest sale volume first for each month. To accomplish this, we use the FIRST VALUE function with the following arguments: first_value(salesman_id) over (partition by extract(MONTH FROM datetime) order by sales_num desc) We are feeding the FIRST VALUE function with the salesman_id argument over the window of a month where the data is sorted in descending order based on the sales_num" column. We expect the salesman ID with the largest associated sale volume to be listed first for each month. SELECT salesman_id, sales_item, sales_num, sales_price, extract(MONTH FROM datetime) month, first_value(salesman_id) over (partition by extract(MONTH FROM datetime) order by sales_num desc) as TOP_SALESMAN FROM sales WHERE extract(MONTH FROM datetime) <= 6 ORDER BY extract(MONTH FROM datetime); Does the result live up to the expectation? SALESMAN _ID SALES _ITEM SALES _NUM SALES _PRICE MONTH TOP_ SALESMAN 10 8 22 160.5 1 10 70 80 2 60.5 1 10 10 6 550 16.1 2 10 30 25 50 26.1 2 10 30 25 330 12.1 3 30 10 10 50 55.1 3 30 40 200 40 10.5 3 30 70 200 22 10.5 3 30 10 200 20 2.1 3 30 10 200 20 1.1 3 30 50 200 15 110 3 30 60 200 8 30.5 3 30 30 200 7 50 3 30 10 100 4 50.5 3 30 10 40 40 5.1 4 10 30 25 30 2.1 5 30 50 600 215 10 6 50 Try It Yourself: You can simulate the results of using FIRST_VALUE with other functions. See if you can find any without using the RANK function. LAST VALUE LAST VALUE is the opposite of FIRST VALUE; it returns the last value of an ordered list of values. Let's take the previous example and simply replace FIRST VALUE with the LAST VALUE: SELECT salesman_id, sales_item, sales_num, sales_price, extract(MONTH FROM datetime) month, last_value(salesman_id) over (partition by extract(MONTH FROM datetime) order by sales_num desc) as TOP_SALESMAN FROM sales WHERE extract(MONTH FROM datetime) <= 6 ORDER BY extract(MONTH FROM datetime); We get this result: SALESMAN _ID SALES _ITEM SALES _NUM SALES _PRICE MONTH TOP_ SALESMAN 10 8 22 160.5 1 10 70 80 2 60.5 1 70 10 6 550 16.1 2 10 30 25 50 26.1 2 30 30 25 330 12.1 3 30 10 10 50 55.1 3 10 40 200 40 10.5 3 40 70 200 22 10.5 3 70 10 200 20 2.1 3 10 10 200 20 1.1 3 10 50 200 15 110 3 50 60 200 8 30.5 3 60 30 200 7 50 3 30 10 100 4 50.5 3 10 10 40 40 5.1 4 10 30 25 30 2.1 5 30 50 600 215 10 6 50 Now the values in salesman_id that have the smallest associated sales_number are listed first, just as we'd expect. Try It Yourself: LAST VALUE and FIRST_VALUE are interchangeable – you can receive the same result you expect from LAST VALUE using FIRST VALUE. I will leave it up to you to figure out how. There, that wasn't too bad. Let's move on to something more complex: the LEAD and LAG functions. You should note that, like FIRST VALUE and LAST VALUE, these have an opposite yet interchangeable relationship. Also, know that these are usually used in calculations as helper functions that show some difference between current value and the leading or lagging value. We'll start with the LAG function. LAG LAG lets you access the information in another row without using a SELF JOIN command. You simply specify the column name (the value) and how many rows back the data is. Once again, we are using this function with a window of data. Important: LAG returns the value from a row located before the current row in the specified sorting order. The syntax of the function looks like this: LAG(value, offset, default_value) value is the column we want to access offset is the number of previous rows we skip to get to the value default_value is what the function returns if there is no value in the row LEAD / LAG functions is usually used in SQL queries as a "helper" function in calculations that show some difference between current value and leading/lagging value. Let's try a simple LAG function in our sales dataset: SELECT salesman_id, sales_item, sales_num, sales_price, datetime as DATE, lag(sales_num,1,0) over (partition by salesman_id order by datetime desc) as LAG_SALES FROM sales WHERE extract(MONTH FROM datetime) <= 6 ORDER BY salesman_id,datetime desc; Here the LAG_SALES column contains sales numbers that are lagging based on the datetime column. The column is reset for every salesman. SALESMAN _ID SALES _ITEM SALES _NUM SALES _PRICE DATE LAG_ SALES 10 40 40 5.1 14-APR-17 0 10 200 20 2.1 20-MAR-17 40 10 10 50 55.1 14-MAR-17 20 10 200 20 1.1 07-MAR-17 50 10 100 4 50.5 01-MAR-17 20 10 6 550 16.1 07-FEB-17 4 10 8 22 160.5 15-JAN-17 550 30 25 30 2.1 25-MAY-17 0 30 25 330 12.1 25-MAR-17 30 30 200 7 50 01-MAR-17 330 30 25 50 26.1 03-FEB-17 7 40 200 40 10.5 02-MAR-17 0 50 600 215 10 03-JUN-17 0 50 200 15 110 03-MAR-17 215 60 200 8 30.5 04-MAR-17 0 70 200 22 10.5 05-MAR-17 0 70 80 2 60.5 05-JAN-17 22 We can try a more complex example. We know who are our top salesmen, in terms of volume; let's explore the difference between the sales volume of the current salesman and that of the top salesman. We'll use a six-month timeframe again. We'll compute this in two steps. Step 1 In this step, we are calculating LAG sales numbers. We are gathering sales_numbers values, offset by 1. If we do not find any row preceding the value, we put in the default value of 0. SELECT salesman_id, sales_item, sales_num, sales_price, extract(MONTH FROM datetime) month, leg(sales_num,1,0) over (order by sales_num desc) as LAG_SALES FROM sales WHERE extract(MONTH FROM datetime) <= 6 ORDER BY sales_num desc; Here is the result: SALESMAN _ID SALES _ITEM SALES _NUM SALES _PRICE MONTH LAG _SALES 10 6 550 16.1 2 0 30 25 330 12.1 3 550 50 600 215 10 6 330 30 25 50 26.1 2 215 10 10 50 55.1 3 50 40 200 40 10.5 3 50 10 40 40 5.1 4 40 30 25 30 2.1 5 40 10 8 22 160.5 1 30 70 200 22 10.5 3 22 10 200 20 2.1 3 22 10 200 20 1.1 3 20 50 200 15 110 3 20 60 200 8 30.5 3 15 30 200 7 50 3 8 10 100 4 50.5 3 7 70 80 2 60.5 1 4 Step 2 Now we will compute the difference between the volume of the current row and the volume of the previous row: SELECT salesman_id, sales_item, sales_num, sales_price, extract(MONTH FROM datetime) month, ABS(leg(sales_num,1,0) over (order by sales_num desc)-sales_num) as DIFFERENCE_SALES FROM sales WHERE extract(MONTH FROM datetime) <= 6 ORDER BY sales_num desc; The result is: SALESMAN _ID SALES _ITEM SALES _NUM SALES _PRICE MONTH DIFFERENCE _SALES 10 6 550 16.1 2 550 30 25 330 12.1 3 220 50 600 215 10 6 115 30 25 50 26.1 2 165 10 10 50 55.1 3 0 40 200 40 10.5 3 10 10 40 40 5.1 4 0 30 25 30 2.1 5 10 10 8 22 160.5 1 8 70 200 22 10.5 3 0 10 200 20 2.1 3 2 10 200 20 1.1 3 0 50 200 15 110 3 5 60 200 8 30.5 3 7 30 200 7 50 3 1 10 100 4 50.5 3 3 70 80 2 60.5 1 2 Now we can continue our analysis on the new column and see some interesting trends. We see that the top performers in sales volume are better then the rest by couple of orders of magnitude. Where we find zero we know that the salesman are selling the same amount of goods. See the resulting data-set and try to come up with some of your own insights. LEAD LEAD is an analytical function that returns the value of an argument column that is offset-ed by an inputted number or rows in the partition window. The function is extracting the offset downstream from the supplied execution order. Like LAG, the LEAD function returns the value from some row after the current row (for example: the next row, or the row two rows after current), in the specified sorting order. So the difference is in the direction in which we take the offset-ed value. The syntax of the function looks the same as with the LAG function (except the keyword): LEAD(value, offset, default_value) value is the column we are calculating upon offset is the number of rows lagging we skip to get to the value default_value is the value the function returns if there is no value left in the lead To demonstrate this function let's get the same result we got from the LAG example. To achieve this we will change one small thing besides the LAG keyword. This is the query: SELECT salesman_id, sales_item, sales_num, sales_price, extract(MONTH FROM datetime) month, ABS(lead(sales_num,1,0) over (order by sales_num desc)-sales_num) as DIFFERENCE_SALES FROM sales WHERE extract(MONTH FROM datetime) <= 6 ORDER BY sales_num desc; Can you spot the difference? (Hint: This was the challenge in the FIRST_VALUE / LAST_VALUE demonstration.) Conclusion On your learning path, I encourage you to practice, practice, practice 😉 There is no better place for that than LearnSQL.com. Discover how you can practice advanced SQL with our platform. The new functions are best mastered and practiced through the excellent hand-on courses of LearnSQL. Practice coding as much as you can and use this article as a side companion, this is the best way to learn advanced SQL. Tags: aggregate functions Advanced SQL window functions