14th Jul 2020 9 minutes read How to Calculate the Difference Between Two Rows in SQL Ignacio L. Bisso sql learn sql window functions Table of Contents Find the Difference Between Two Values in the Same Row Calculating the Difference Between Two Values in the Same Column Calculating the Difference Between Date Values in SQL Finding the Difference Between Non-Consecutive Records More About SQL Calculations and Window Functions Calculating the difference between two rows in SQL can be a challenging task. It is possible – and there’s more than one way to do it. In this article, I’ll explain how to use the SQL window functions LEAD() and LAG() to find the difference between two rows in the same table. The best way to learn window functions is our interactive Window Functions course. It contains over 200 hands-on exercise that will teach you all window functions concepts, from simple OVER() clause to complex topics such as combining window functions and GROUP BY clause. If you already know window functions and want to refresh your skills, check out our Window Functions Practice Set with 100 hands-on exercises. I like to think of this as a “happy hour" article – you can learn about two topics (calculating the difference between two rows and SQL window functions) by reading just one article. Let’s get right into it! Find the Difference Between Two Values in the Same Row To calculate any difference, you need two elements; to calculate a difference in SQL, you need two records. You can calculate the difference between two columns in the same record, as I’ll show in a moment. It’s very easy. However, I’ll mainly focus on finding the difference between two values of the same column in different records. First, let’s talk about our data. We’ll use a database with two tables, as used by the government of an imaginary province to define some social programs. The first table is housing, which contains data about the number of people renting, owning, or needing a house. Have a look: Table housing CityYearTotal PopulationPopulation Owning housePopulation renting housePopulation needing house Goldpolis201722501500500250 Silverpolis201717501200400150 Bronzepolis201714201000300120 Goldpolis201824251600550275 Silverpolis201819201300410210 Bronzepolis201817301020300410 Goldpolis201926101750580280 Silverpolis201921101400420290 Bronzepolis201920101050300660 To design a plan to build houses for people who need them, the government wants to obtain some metrics about the housing problems in different cities. Suppose the government wants to know how many people in each city don’t own a house; the query will be: SELECT city, total_population, total_population - population_owning_house AS people_not_owning_house FROM housing In this query, it’s clear you’re calculating a difference using two different columns in the same record. Next, you’ll calculate differences using two records. Calculating the Difference Between Two Values in the Same Column Usually, all the operations we do in a SQL query are related to the current record. But in this case, we need a second record. This is the tricky part. We’ll use a technique that employs the windows functions LAG() and LEAD() to obtain data from another record. If you want to go deeper, I suggest reading this window functions article that gives clear explanations of window functions with plenty of examples. Let’s suppose that, for one specific city ( ‘Bronzepolis’ ), you want to obtain the variation of people needing a house related to the previous year. Let’s see the query: SELECT city, year, population_needing_house, LAG(population_needing_house) AS previous_year, population_needing_house - LAG(population_needing_house) OVER (ORDER BY year ) AS difference_previous_year FROM housing WHERE city = ‘Bronzepolis’ ORDER BY year In the blue text, you can see the calculation of the SQL delta between two rows. To calculate a difference, you need a pair of records; those two records are “the current record” and “the previous year’s record”. You obtain this record using the LAG() window function. This function allows you to obtain data from the previous record (based on an order criterion, which here is “ORDER BY year”). Below is the result of this query. The yellow arrows show that the value of the LAG() function returns the same population_needing_house value as the previous year’s record. The column at the right shows the result of the difference between the current and previous years. For the next query, we’ll expand the analysis of housing issues to all cities. For each city, we want the difference between the current and previous years for the population_needing_house column. First, you’ll remove the condition city = ‘Bronzepolis’. You want to calculate values for each city, so you need a way to separate data into groups. This is why you add the PARTITION BY city clause to LAG(). PARTITION BY city allows you to process all the records for the same city in the same window. You’ll use LAG() again to calculate the difference between the number of people needing a house between this year and the previous year. Let’s see the query: SELECT city, year, population_needing_house, LAG(population_needing_house) OVER (PARTITION BY city ORDER BY year ) AS previous_year, population_needing_house - LAG(population_needing_house) OVER (PARTITION BY city ORDER BY year ) AS difference_previous_year FROM housing ORDER BY city, year In blue text, you can see how the difference is calculated. Here’s a plain-English version of what’s going on: difference_previous_year = population_needing_house in current record - population_needing_house in the previous year’s record Where: Population_needing_house in the current record is the column population_needing_house and Population_needing_house in the previous year’s record is obtained by the function LAG(population_needing_house) OVER (PARTITION BY city ORDER BY year ) CityYearPopulation needing housePrevious YearDifference previous year Bronzepolis2017150nullnull Bronzepolis2018410150290 Bronzepolis2019660410250 Goldpolis2017250nullnull Goldpolis201827525025 Goldpolis20192802755 Silverpolis2017150nullnull Silverpolis201821015060 Silverpolis201929021080 So, you've used the PARTITION clause to create a set with all the records for the same city. Then the ORDER BY clause is used to order all these records by year. Finally, the LAG() function is used to obtain the population_needing_house value from the previous record. Another SQL window function, LEAD(), is similar to LAG(). However, it returns the next record in the set (in our example, this would be the next year’s record). If you review the two previous queries, you’ll see that we use the same pattern to calculate the difference: subtracting the previous column value from the current value using LAG() (or from the next record using LEAD()). SQL window functions are very powerful. For examples from different business areas, like finance, sales, and trading, check out this article on when to use SQL window functions. Calculating the Difference Between Date Values in SQL In the previous examples, you calculated the delta between two rows using numeric column values. Now I’ll demonstrate how to calculate the difference between two values of the date data type. First, let’s consider the result’s data type. When you calculate the difference between two date values, the result is not a date. It is an interval that represents the number of days between the two dates. Suppose there’s a table called hospital_statistics that stores statistics for the province’s three hospitals. Below is a sample of the table data: Table hospital_statistics Hospital namedayillnessnumber of patients Hospital of Bronzepolis2017-03-22MDLR1 Hospital of Goldpolis2017-12-03MDLR1 Hospital of Silverpolis2018-08-03MDLR1 Hospital of Bronzepolis2019-01-23MDLR1 Hospital of Goldpolis2019-06-14MDLR1 Now let’s say there’s a rare illness called MDLR. The government wants to study how often a patient with MDLR is admitted to any of the hospitals. They’ve asked for a report with the columns day, hospital name, number of cases, and days_since_the_last_case. We’ll use SQL to build the report: SELECT day, hospital_name, number_of_patients, day - LAG(day) OVER (ORDER BY day) AS days_since_last_case FROM hospital_statistics WHERE illness_name = 'MDLR' ORDER BY day You can see the same pattern that we used before. Now, it’s being used to calculate the days since the last case. The only difference is that you’re calculating a difference between two dates instead of numeric values. As you see in the results, the column days_since_last_case is an integer value, representing a number of days. DayHospital NameIllnessPatientsdays_since_last_case 2017-03-22Hospital of BronzepolisMDLR1null 2017-12-03Hospital of GoldpolisMDLR1256 2018-08-03Hospital of SilverpolisMDLR1243 2019-01-23Hospital of BronzepolisMDLR1173 2019-06-14Hospital of GoldpolisMDLR1142 SQL also allows you to calculate differences between timestamp values. And there are other interesting date arithmetic operations you can use on date-related data types. If you want to go deeper, I suggest the article How to Analyze Time Series COVID 19 Data with SQL Window Functions. Finding the Difference Between Non-Consecutive Records Until now, you’ve calculated differences between contiguous records based on specific order criteria. In some cases, you’ll need to calculate the difference between non-contiguous records. As an example, let’s go back to the first query. Suppose we want to add another column showing the number of cases in the last two years. Fortunately, LAG() and LEAD() has an optional parameter that specifies how many records to skip before/after the current record. By default, this parameter is 1 (i.e. “use the next/previous record”), but you can set it to another number. So, with this new parameter, the query will be: SELECT city, year, population_needing_house, LAG(population_needing_house) OVER (PARTITION BY city ORDER BY year ) AS one_year_before, LAG(population_needing_house,2) OVER (PARTITION BY city ORDER BY year ) AS two_years_before, population_needing_house - LAG(population_needing_house) OVER (PARTITION BY city ORDER BY year ) AS difference_previous_year, population_needing_house - LAG(population_needing_house,2) OVER (PARTITION BY city ORDER BY year ) AS difference_last_two_years FROM housing ORDER BY 1, 2 DESC In blue, you can see the calculation of the difference between two non-consecutive records. In the next image, we can see the new column Difference last two years only for the 2019 values; this is because 2019 is the only year that has data for two prior years. The red lines connect the LAG() function’s results (obtaining data for 2017 and showing the same data in the 2017 record). You can see that both lines have the same value. We’ve shown how important LAG() and LEAD() are, but there are many more window functions in SQL. Read this article about window functions to learn about more of them. More About SQL Calculations and Window Functions In this article, you’ve learned how to calculate the difference between consecutive rows by applying SQL. You’ve also learned how to use LAG() and LEAD() to find the difference between non-consecutive rows. Although there are other ways to calculate this difference, such as using a self-join, these methods are very useful. If you will be using windows functions (or indeed SQL) frequently – or if you just want to improve your knowledge – this complete Window Functions Cheat Sheet is an excellent resource. To learn window functions through practice, try our Window Functions course. You can read about the course in this article. Tags: sql learn sql window functions