Back to articles list Articles Cookbook
20 minutes read

Who Should Learn SQL Window Functions?

Do you want to learn how SQL window functions can help you at your job? This article will show you examples from various business applications where they can be very useful.

I won’t be explaining what SQL window functions are in this article, but rather how to use them. If you’re not familiar with window functions or their syntax, don’t worry. Here’s an article that can help you with an introduction to SQL window functions.

SQL window functions are very helpful when creating any kind of report. They are typically used in ranking data and calculating running totals. They’re especially useful with time-series data, such as calculating the differences between the previous and current periods. You can calculate moving averages or even the percentage changes compared to the previous periods.

This is applicable to a wide range of professions: from financial experts and analysts to those working in sales, retail, e-commerce, supply chain management, and product management, among others. Public health experts and epidemiologists analyze data using SQL window functions. All decision makers, from team leaders to top management, could benefit from understanding window functions.

SQL window functions can do plenty of other calculations. The examples below will focus only on the calculations I’ve already mentioned. And instead of describing every single possibility, I’ll review the most common uses and present them on different types of data.

To find out what window functions are, read a detailed explanation by the creator of LearnSQL.com’s course on window functions.

Moving on to the examples!

Ranking Data

One of the most common and easiest uses of the window functions is for ranking data. You can do this by using several different functions, such as RANK(), DENSE_RANK(), and ROW_NUMBER().

I’ll review ROW_NUMBER() in the following examples. No special reason; it’s just that I have yet to use ROW_NUMBER() in my writings about window functions, so it’s about time to change that!

Example 1: The top 15 Salespersons

If you work as a data analyst, you’ve probably had to create reports that contain, well, top lists—be it top salesperson, best-selling products, most visited websites, etc. It doesn’t matter; if there’s data, you can create a report.

You’ve probably seen such lists if you are an employee in a sales department, and your future in the company might depend on them. If you’re a manager of a sales department, I’m sure this is one of your favorite reports. Have you ever tried to create one yourself? It’s easy!

For this example, the data is stored in a table called salesperson. It consists of the following columns:

  • id: the ID of the salesperson
  • first_name: the first name of the salesperson
  • last_name: the last name of the salesperson
  • sales_2019: sales achieved during 2019

Write a little query like this one and you’ll get the result:

SELECT	ROW_NUMBER() OVER (ORDER BY sales_2019 DESC) AS 
            salesperson_rank,
		first_name,
		last_name,
		sales_2019
FROM salesperson;

What does this code do? It uses the ROW_NUMBER() function, which will assign a row number to every record in the table. With the OVER() clause, you define the window over which the operation will be performed. Omitting PARTITION BY() from the OVER() clause means I want the operation to be performed over the whole table. Ordering sales_2019 in descending order means I will get the best salesperson in the first row. The row number will be shown in the new column, which I have named salesperson_rank. The code then selects the remaining columns from the table salesperson.

There you have it, your top 15 salespeople:

salesperson_rankfirst_namelast_namesales_2019
1EachelleLound573,170.97
2FranklinLoins568,735.06
3LandLonghurst564,691.01
4NediBarsham555,217.57
5TinaLudlow538,225.30
6DoriseGasking519,220.00
7EllswerthDivis513,243.52
8AnaGolda512,555.12
9LucieBrewster511,441.47
10CurranDaouze504,939.45
11ConstantaKhomishin504,017.19
12BronsonJoburn492,430.44
13YvonnePlayhill489,094.94
14HortensiaHartness488,289.00
15PhillipMulqueeny484,875.87

If you feel shaky about the syntax, have our SQL window functions cheat sheet open and refer to it while going through the examples.

Example 2: The 15 Worst-Selling Products

For this example, let’s imagine you work as a product manager. You oversaw the creation and launch of thirty new products in the last 12 months for your company. Now you’re interested in seeing which products are not performing well, so you can develop strategies to increase their Amazon sales or replace them with another product.

The data about the new products are in the table new_products. It consists of the following columns:

  • id: the ID of the product
  • product_name: the name of the product
  • number_sold: quantity of the product sold

Of course, the code that will get you the desired report is similar to the one in the previous example. Here it is:

SELECT	ROW_NUMBER() OVER (ORDER BY number_sold ASC) AS worst_products,
		product_name,
		number_sold
FROM new_products;

I’ve again used the ROW_NUMBER() function. In the OVER() clause, I want the data to be ranked according to the number_sold column. It has to be ranked in ascending order, hence the ASC in the code. Row numbers will be shown in the column worst_products. The rest of the code selects the remaining columns from the table new_products.

Run the code, and you will see your 15 worst-performing products:

worst_productsproduct_namenumber_sold
1ChicoReal4,567
2WillowBook4,587
3Somoon6,587
4DaskaPeetal7,821
5Huisterdenkaart8,564
6OneZemalyac12,284
7Streechek12,284
8BarbieQue14,562
9Bleetwa14,587
10Leecymur14,587
11Yegulya14,887
12Egesmeder18,357
13Kuymuck20,140
14MrBasil22,568
15ZulufAlba31,400
16WishyWashy48592
17RobiKnotebook55678
18Dramalone56897
19FragolinoDiMonfalcone66987
20KerberQama78521

You’ll probably notice that OneZemalyac and Streechek have the same number of products sold. The same is the case with products Bleetwa, Leecymur, and Yegulya. However, they don’t have the same ranking, because they are numbered sequentially. Products OneZemalyac and Streechek are ranked 6th and 7th, not 6th and 6th. That is, ranking using the ROW_NUMBER() function does not allow for ties.

Example 3: Ranking Inventory by Region

This example might be of interest to someone managing the inventory or working in supply chain management. You work for a company that has factories in four regions across the country. All the factories produce the same five products, and the goods produced are stored in the regional warehouses. The table inventory has the following information:

  • id: the ID of the product
  • product_name: the name of the product
  • quantity: the quantity of the product stored in the regional warehouses at the end of the year
  • region: the name of the region

You want to improve the product distribution to the customers, which should result in minimizing the quantity of the goods stored as inventory in the regional warehouses. You want to start by ranking the products within each region by their inventories in the warehouse. How would you get the desired report by using SQL window functions?

Here it is:

SELECT	ROW_NUMBER() OVER (PARTITION BY region ORDER BY quantity DESC) AS        
inventory_rank,
		product_name,
		quantity,
		region
FROM inventory;

As in the earlier example, the ROW_NUMBER() function gives you what you want. However, I am using the PARTITION BY() clause this time. With this clause, I define the partitions over which the operation (in this case, ranking) will be performed by specifying the column that will be used for the aggregation. If I omit the PARTITION BY() clause, the data would be ranked across the whole table, i.e. regardless of the region. Since I’m interested in seeing the data by region, I’ve chosen the column region by which to partition. The data is ordered by the quantity in descending order, and the rank will be shown in the column inventory_rank. The remainder of the code selects other columns from the table inventory.

Run the code, and you’ll get the report very quickly:

inventory_rankproduct_namequantityregion
1POW87210,000East
2RWU8759,845East
3IOE9357,894East
4KFUO246,894East
5HGX3141,000East
1POW8729,457North
2HGX3148,524North
3RWU8754,825North
4IOE9351,578North
5KFUO2475North
1KFUO2414,587South
2RWU87512,845South
3POW8727,542South
4HGX314754South
5IOE93582South
1HGX31412,587West
2KFUO2412,300West
3RWU8754,852West
4POW8724,489West
5IOE935518West

If you want to learn more about this topic, here’s an article about ranking data using window functions.

SQL window functions truly show their power in analyzing trends over time. You can calculate the differences between the previous and the current periods, get the percentage increase or decrease compared to the previous periods, or calculate the moving averages. This is often used by brokers, fund managers, or any kind of financial experts who monitor historical data and build or use forecasting models. What you read about COVID-19 every day—the daily numbers of people infected, recovered, or deceased, the estimates of the future development of the pandemic, etc.—is based on analyzing trends over time. Epidemiologists and public health experts do this daily. Take any manager in any company in the world, and their decisions are based on analyzing historical data.

Let’s start by calculating the differences between periods.

Example 4: Calculating Daily Changes of a Share Price

If you are a fund manager, or any kind of investor, you’ll likely be interested in this analysis quite often. If you work as a financial analyst, you’ve probably done this analysis quite frequently. It doesn’t have to be the share price; it can be any data for which you want to compare the current and the previous periods. It doesn’t matter whether it is about daily, weekly, monthly, quarterly, or yearly changes; SQL window functions work the same way.

In this example, you have daily prices of one share. You want to calculate the differences day over day, so you can build a model and forecast future price changes. All the data you need is in the table share which contains the following columns:

  • ticker: the ticker symbol of the share, i.e. the short name under which it is being traded
  • company: the name of the company that issued the share
  • date: the date traded
  • price: the price of the share

First, let’s think about the logic. It’ll make writing the code easier.

What you want to do here is to take the price from the previous day and subtract that from the price of the current day. And you need to do that for every record you have in the table.

Now, let’s try to translate this logic into code:

SELECT	ticker,
		company,
		date,
		price,
		LAG(price) OVER (ORDER BY date) AS previous_day_price
FROM share;

Nothing strange in the first part of the code. I’ve simply selected all the columns from the table share.

Now comes the fun part! There’s something called LAG(). It allows you to go back a certain number of rows and have data from that row be shown in the current row. You can go back any number of rows. The default value is 1, which is why LAG(price) goes back just one row. Even though I didn’t specify the number of rows as LAG (price, 1), it does exactly what I need.

The remainder of the code is like the window functions you’ve already seen. There’s an OVER() clause, and I want the operation to be performed by date. The data will be shown in the column previous_day_price.

Run the code to get the table below:

tickercompanydatepriceprevious_day_price
PTAPanthelya Inc.2020-06-0145.32NULL
PTAPanthelya Inc.2020-06-0246.3845.32
PTAPanthelya Inc.2020-06-0347.1246.38
PTAPanthelya Inc.2020-06-0447.1247.12
PTAPanthelya Inc.2020-06-0552.3247.12

However, this is not the analysis you wanted. You’re not interested in just seeing the previous day's price, are you? What you want is to calculate the difference between the current price and the price of the day before. How would you do this in one step, now that you know what the LAG() function can do? Yes, it’s this simple:

SELECT	ticker,
		company,
		date,
		price,
		(price - LAG(price) OVER (ORDER BY date)) AS daily_change
FROM share;

The code is still pretty much the same! The only difference is that the LAG() function is subtracted from the price. Yes, it does mean what you think it means! I subtracted the previous day’s price from the current price, with the result to be shown in the column daily_change.

Finally, here’s the table you want to see:

tickercompanydatepricedaily_change
PTAPanthelya Inc.2020-06-0145.32NULL
PTAPanthelya Inc.2020-06-0246.381.06
PTAPanthelya Inc.2020-06-0347.120.74
PTAPanthelya Inc.2020-06-0447.120
PTAPanthelya Inc.2020-06-0552.325.2
PTAPanthelya Inc.2020-06-0658.185.86
PTAPanthelya Inc.2020-06-07590.82
PTAPanthelya Inc.2020-06-0862.543.54
PTAPanthelya Inc.2020-06-0958.64-3.9
PTAPanthelya Inc.2020-06-1060.081.44
PTAPanthelya Inc.2020-06-1169.849.76
PTAPanthelya Inc.2020-06-1243.22-26.62
PTAPanthelya Inc.2020-06-1352.229
PTAPanthelya Inc.2020-06-1477.5425.32
PTAPanthelya Inc.2020-06-1594.2116.67
PTAPanthelya Inc.2020-06-1692.84-1.37
PTAPanthelya Inc.2020-06-1792.75-0.09
PTAPanthelya Inc.2020-06-18930.25
PTAPanthelya Inc.2020-06-1992.84-0.16
PTAPanthelya Inc.2020-06-2094.451.61
PTAPanthelya Inc.2020-06-2194.490.04
PTAPanthelya Inc.2020-06-2294.21-0.28
PTAPanthelya Inc.2020-06-2398.183.97
PTAPanthelya Inc.2020-06-2492.27-5.91
PTAPanthelya Inc.2020-06-2597.845.57
PTAPanthelya Inc.2020-06-2642.56-55.28
PTAPanthelya Inc.2020-06-2732.54-10.02
PTAPanthelya Inc.2020-06-2828.63-3.91
PTAPanthelya Inc.2020-06-2930.241.61
PTAPanthelya Inc.2020-06-3038.648.4

The first row is NULL, because the first day of the month does not have any previous value to subtract from it.

Example 5: Calculating Daily Percent Changes of new COVID-19 Cases

Epidemiologists and public health experts are under the spotlight these days with a lot of pressure. Their job is not easy right now, but it can be made a little bit easier by SQL window functions.

Imagine there’s a pandemic going on in the world. OK, we don’t have to imagine that. You have daily data for new COVID-19 cases in an imaginary country. For our amusement, let’s call it Covidlandia. You need to analyze the data and calculate the daily percentage change in new cases. The data in the table covid_19_new_cases, of course, is completely made up. There are three columns:

  • country: the country of the new cases
  • date: the date of the new cases
  • new_cases: the number of new cases

As in the previous example, let’s talk about logic and mathematics first. How would you get the desired result without SQL? For example, there are 78 new cases today, and there were 54 new cases yesterday. You should subtract yesterday’s number from today’s number, then divide the difference by yesterday’s number. To get the percentage, multiply the result by 100. In other words:

(78-54)/54*100 = 44.44%

Now, let’s translate this into SQL code:

SELECT	country,
		date,
		new_cases,
		(new_cases - LAG(new_cases) OVER (ORDER BY date))/LAG(new_cases) OVER (ORDER BY date)*100 AS daily_percent_change
FROM covid_19_new_cases;

Since you already understand the logic of the LAG() function from the previous example, I won’t break the code into detailed steps. In the first part, you can see I’ve selected the country, date, and new_cases columns from the table covid_19_new_cases.

Now comes the part that might look scary. But it’s not; it’s nearly the same code as in the previous example. Let’s analyze it! First, I subtract the number of cases of the previous day from today’s number. This is exactly what the part new_cases - LAG(new_cases) OVER (ORDER BY date) does. Then I divide the result by the number of cases of the previous day, which is: LAG(new_cases) OVER (ORDER BY date). I multiply the result by 100 to get the percentage, which is shown in the column daily_percent_change.

The result can be shown as a table:

countrydatenew_casesdaily_percent_change
Covidlandia2020-06-0112NULL
Covidlandia2020-06-021850.00
Covidlandia2020-06-0317-5.56
Covidlandia2020-06-042547.06
Covidlandia2020-06-053228.00
Covidlandia2020-06-063818.75
Covidlandia2020-06-07405.26
Covidlandia2020-06-084512.50
Covidlandia2020-06-095726.67
Covidlandia2020-06-1011296.49
Covidlandia2020-06-1115841.07
Covidlandia2020-06-121580.00
Covidlandia2020-06-1317410.13
Covidlandia2020-06-141845.75
Covidlandia2020-06-151903.26
Covidlandia2020-06-16187-1.58
Covidlandia2020-06-17184-1.60
Covidlandia2020-06-1820410.87
Covidlandia2020-06-192081.96
Covidlandia2020-06-202080.00
Covidlandia2020-06-212121.92
Covidlandia2020-06-2224816.98
Covidlandia2020-06-2335743.95
Covidlandia2020-06-2441917.37
Covidlandia2020-06-25416-0.72
Covidlandia2020-06-26403-3.13
Covidlandia2020-06-27400-0.74
Covidlandia2020-06-28396-1.00
Covidlandia2020-06-293960.00
Covidlandia2020-06-30347-12.37

Note that the first value is again NULL. The reason is the same as it was in the previous example.

Now that you’ve learned this calculation on a made-up data set, try analyzing real COVID-19 data. My colleague’s article gives you very detailed guidance on how to do that as well as show some other uses of window functions.

Example 6: Calculating Moving Averages of Monthly Site Visits

This time, you work in an e-commerce company with three sites. You’re the manager at the company, and you’ve asked your analyst to prepare a report showing average monthly visits for each site owned by your company. Your analyst is, of course, very experienced. He or she knows that calculating average monthly visits is straightforward but would not smooth out the volatility and the seasonality of the site visits. This is a reason to get a report with moving averages, which consider the current month and the two previous months.

In case you’re not familiar with moving averages, here is a simple example. We have monthly data for hotel overnight stays, their overall average (i.e. the arithmetic mean), and the moving averages. The hospitality industry can be very seasonal, so calculating the arithmetic mean can give you a very distorted picture. You can see that the average of monthly overnight stays is 2165.83. The moving averages range from 541.33 to 3832.00, which are far more realistic. Here’s the table to see for yourself:

monthovernight_staysaveragemoving_average
01/20193,5822,165.833,582.00
02/20191,8022,165.832,692.00
03/20196872,165.832,023.67
04/20192482,165.83912.33
05/20196892,165.83541.33
06/20192,2502,165.831,062.33
07/20193,0122,165.831,983.67
08/20195,8972,165.833,719.67
09/20192,5872,165.833,832.00
10/20194822,165.832,988.67
11/20192342,165.831,101.00
12/20194,5202,165.831,745.33

To visualize the difference between the overall average and the moving average, take a look at the chart below:

Overall average and the moving average

We now move on to calculating moving averages using SQL window functions.

Back to our monthly site visits. The table site_visit consists of the following columns:

  • id: the ID of the visit
  • site: the name of the site
  • month: the month of the visits
  • number_of_visits: the number of the site visits

The code below is what you need:

SELECT	id,
		site,
		month,
		number_of_visits,
		AVG (number_of_visits) OVER (PARTITION BY site ORDER BY month ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING) AS moving_average_visits
FROM site_visit;

Let me explain what I am doing here. First, I select all the columns from the table site_visit. Then comes the interesting part! I use the SQL window function AVG (number_of_visits), since I want the average of the site visits. Then comes the OVER() clause, as always. Since I want averages separately for each site rather than for all three sites together, I use PARTITION BY site. This means I am aggregating data at the site level.

The operation needs to be performed sequentially by month and not in some random order. To ensure this, there is ORDER BY month. I want to calculate the moving averages over the current month and the two previous months to smooth out the volatility. This is defined by ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING. This takes three rows into account when calculating the moving averages: the current row and the two rows preceding it.

Depending on how you want to calculate the moving averages, you can increase or decrease the number of preceding and following rows. For example, if you write ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING, this means you’re taking a total of six rows to calculate the moving average: the current row, the three rows preceding it, and the two rows following it.

Here’s my report:

idsitemonthnumber_of_visitsmoving_average_visits
1E-commerceSuperSite01/201945,789,46545,789,465.00
2E-commerceSuperSite02/201945,852,42945,820,947.00
3E-commerceSuperSite03/201945,857,46545,833,119.67
4E-commerceSuperSite04/201945,987,45245,899,115.33
5E-commerceSuperSite05/201946,124,75645,989,891.00
6E-commerceSuperSite06/201946,125,74646,079,318.00
7E-commerceSuperSite07/201946,124,75646,125,086.00
8E-commerceSuperSite08/201946,114,78446,121,762.00
9E-commerceSuperSite09/201946,125,41146,121,650.33
10E-commerceSuperSite10/201946,125,78446,121,993.00
11E-commerceSuperSite11/201946,178,42146,143,205.33
12E-commerceSuperSite12/201946,170,25446,158,153.00
1GreatSite4U01/201956,78956,789.00
2GreatSite4U02/201974,56465,676.50
3GreatSite4U03/201985,42672,259.67
4GreatSite4U04/201972,54777,512.33
5GreatSite4U05/201975,00077,657.67
6GreatSite4U06/201992,54680,031.00
7GreatSite4U07/201989,54685,697.33
8GreatSite4U08/201987,23789,776.33
9GreatSite4U09/201987,41288,065.00
10GreatSite4U10/201976,39883,682.33
11GreatSite4U11/201969,87477,894.67
12GreatSite4U12/201984,41776,896.33
1PleaseVisit01/2019897897.00
2PleaseVisit02/2019658777.50
3PleaseVisit03/20192,5871,380.67
4PleaseVisit04/20196,8453,363.33
5PleaseVisit05/201910,2546,562.00
6PleaseVisit06/201911,4879,528.67
7PleaseVisit07/201913,34511,695.33
8PleaseVisit08/201914,89713,243.00
9PleaseVisit09/201915,49714,579.67
10PleaseVisit10/201918,84516,413.00
11PleaseVisit11/201928,46720,936.33
12PleaseVisit12/201984,41743,909.67

Let’s analyze the result a bit to understand how the calculation works.

The first moving average is the same as the number of visits. This is expected, because there is no data before this row. SQL takes the total number of visits, divides it by the number of records (which is one, in this case), and returns the result that equals the number of visits for the month.

idsitemonthnumber_of_visitsmoving_average_visits
1E-commerceSuperSite01/201945,789,46545,789,465.00

Did you expect that the moving average in the second row would be equal to the number of visits for the month since there’s not enough history? If you did, then you would be wrong! This moving average sums the current row and the row before and divides the result by the number of rows. It is divided by two, even though I specify I want three rows considered. Let’s check the result:

(45,789,465 + 45,852,429)/2 = 45,820,947.00

It is correct!

idsitemonthnumber_of_visitsmoving_average_visits
1E-commerceSuperSite01/201945,789,46545,789,465.00
2E-commerceSuperSite02/201945,852,42945,820,947.00

The third row is finally doing what I want: take the current row, take the two rows before it, and return the average. Let’s check:

(45,789,465 + 45,852,429 + 45,857,465)/3 = 45,833,119.67

Correct again! No need to check anything else; maybe we should just start trusting SQL!

idsitemonthnumber_of_visitsmoving_average_visits
1E-commerceSuperSite01/201945,789,46545,789,465.00
2E-commerceSuperSite02/201945,852,42945,820,947.00
3E-commerceSuperSite03/201945,857,46545,833,119.67

Calculating Running Totals

Running totals are found in different kinds of analysis. They can be very helpful in financial analysis, for example, and SQL window functions provide you with the tools to calculate them easily.

Running totals are also called cumulative sums, since they add the current values to the total of all previous values. This article about running totals has an approachable explanation, with several examples of how it is used. It doesn’t mean I won’t show you some examples too!

Example 7: Calculating Running Totals of the Debt Collected by Your Call Center

It seems every company has a call center today. Banks, other financial institutions, debt collection agencies, telecom companies, you name it. One purpose of the call center is to remind the customer of the unpaid bills. You’re monitoring the efficiency of the call center and want to analyze the amount collected after the call center contacts the customer.

The table is debt_collected, and its columns are as follows:

  • id: the ID of the debt collected
  • month: the month when the debt was collected
  • amount: the amount of the debt collected

To get the running totals, you need this code:

SELECT	id,
		month,
		amount,
		SUM(amount) OVER (ORDER BY month) AS debt_collected_rt
FROM debt_collected;

First, I select the columns from the table debt_collected. Then I need to calculate the running totals. To do that, I need SUM() used as a window function with the column to be totaled specified in the parenthesis. Then comes the OVER() clause; PARTITION BY() is omitted because I want the running total of all the data available. The operation will be performed sequentially, from January to December, and not in some random order; hence the data is ordered by month. The result will appear in the column debt_collected_rt.

The result of the query looks like this:

idmonthamountdebt_collected_rt
101/2019575,457.28575,457.28
202/2019578,200.851,153,658.13
303/2019567,257.771,720,915.90
404/2019657,452.122,378,368.02
505/2019622,157.423,000,525.44
606/2019608,745.473,609,270.91
707/2019594,122.334,203,393.24
808/2019591,114.494,794,507.73
909/2019541,258.685,335,766.41
1010/2019584,127.115,919,893.52
1111/2019587,774.436,507,667.95
1212/2019596,471.877,104,139.82

Example 8: Calculating Running Totals of the Employee Costs

Let’s practice a little more with a similar example. What if you’re an HR manager and need to see the monthly employee costs, together with the running totals? For instance, if your task is to plan the budget for the next year, you need some historical data to help you. The previous year’s budget is always a good starting point.

The data can be found in the table employee_costs, and its columns are:

  • id: the ID of the employee costs
  • month: the month of the employee costs
  • amount: the amount of the employee costs

The code is practically the same as in the previous example:

SELECT	id,
		month,
		amount,
		SUM(amount) OVER (ORDER BY month) AS costs_rt
FROM employee_costs;

I’ve selected all the columns from the table employee_costs. The window function sums the amount month by month and puts the result in the new column costs_rt.

idmonthamountcosts_rt
101/201984,992.5784,992.57
202/201987,562.24172,554.81
303/201986,451.82259,006.63
404/201986,451.82345,458.45
505/201985,456.13430,914.58
606/201986,782.45517,697.03
707/201988,253.45605,950.48
808/201988,795.64694,746.12
909/201989,974.34784,720.46
1010/201992,444.44877,164.90
1111/201993,012.55970,177.45
1212/201993,999.141,064,176.59

Example 9: Calculating Running Totals of Quarterly Sales

If you’re a financial analyst, a sales manager, a regional manager, or any kind of manager, you’ve probably seen reports like this. In this scenario, you work for a company with five geographical regions. You have data on quarterly sales by region. What you want is a report showing the running total of the sales separately for each region.

All the data you need is in the table regional_sales. The columns are:

  • id: the ID of the sales
  • region: the name of the region
  • quarter: the quarter of the sales
  • amount: the amount of the sales

Here’s a simple code that allows you to see the required results:

SELECT	id,
		region,
		quarter,
		amount,
		SUM (amount) OVER (PARTITION BY region ORDER BY quarter) AS regional_sales_rt
FROM regional_sales;

As always, I select all columns from the table. Then comes the window function part. I sum up the amount by using the SUM() function. The window is defined by OVER(). There is a PARTITION BY clause this time, because I want to see the data by region. The data needs to be summed up in order by quarter, so I order the data by quarter. Finally, the resulting data will be shown in the column called regional_sales_rt.

idregionquarteramountregional_sales_rt
1Central Europe1Q20197,854,127.327,854,127.32
2Central Europe2Q20197,782,112.2315,636,239.55
3Central Europe3Q20197,612,556.8823,248,796.43
4Central Europe4Q20198,023,448.7731,272,245.20
5Eastern Europe1Q20195,412,444.625,412,444.62
6Eastern Europe2Q20195,208,412.3710,620,856.99
7Eastern Europe3Q20195,132,445.5815,753,302.57
8Eastern Europe4Q20195,800,613.2221,553,915.79
13Northern Europe1Q20193,541,222.143,541,222.14
14Northern Europe2Q20193,247,772.676,788,994.81
15Northern Europe3Q20193,456,773.2910,245,768.10
16Northern Europe4Q20193,320,520.8413,566,288.94
17Southern Europe1Q20191,482,222.661,482,222.66
18Southern Europe2Q20191,628,741.563,110,964.22
19Southern Europe3Q20192,208,456.035,319,420.25
20Southern Europe4Q20192,485,212.337,804,632.58
9Western Europe1Q201911,285,774.2611,285,774.26
10Western Europe2Q201911,487,662.2922,773,436.55
11Western Europe3Q201912,564,442.8335,337,879.38
12Western Europe4Q201911,662,451.1847,000,330.56

Do you Want to Learn SQL Window Functions?

By giving you nine examples, I’ve tried to show you various scenarios in which you could find SQL window functions helpful. What you have learned about window functions here should make it easier for you to start learning through our Window Functions course.

The examples in this article do not cover everything window functions can do. Instead of smothering you with all their possibilities, we presented only a handful of uses. The point was to show you how one function could be helpful for several different job descriptions. As you have seen, learning about window functions is wise for every manager and data analyst.

Of course, the usefulness of the window functions doesn’t stop here. Experts working in sales, retail, inventory, e-commerce, product and supply chain management, finance, and public health, among others, will find learning about window functions makes their job easier. I hope I’ve managed to find at least one example to which you could relate in your everyday tasks. If I have not, however, it doesn’t mean SQL window functions can’t be useful to you. It just reflects the wide variety of scenarios in which window functions could be used; it is impossible to cover with just a handful of examples. If you think a little bit about your day-to-day tasks, I’m sure you will find situations in which you could apply window functions.

I’d like to hear from you in the comments section. Feel free to share your experience with SQL window functions and how you use them.