5th Aug 2020 20 minutes read

Who Should Learn SQL Window Functions?

Table of Contents

Ranking Data
Analyzing Trends Over Time
Calculating Running Totals
Do you Want to Learn SQL Window Functions?

Do you want to learn how SQL window functions can help you at your job? This article will show you examples from various business applications where they can be very useful.

I won’t be explaining what SQL window functions are in this article, but rather how to use them. If you’re not familiar with window functions or their syntax, don’t worry. Here’s an article that can help you with an introduction to SQL window functions.

SQL window functions are very helpful when creating any kind of report. They are typically used in ranking data and calculating running totals. They’re especially useful with time-series data, such as calculating the differences between the previous and current periods. You can calculate moving averages or even the percentage changes compared to the previous periods.

This is applicable to a wide range of professions: from financial experts and analysts to those working in sales, retail, e-commerce, supply chain management, and product management, among others. Public health experts and epidemiologists analyze data using SQL window functions. All decision makers, from team leaders to top management, could benefit from understanding window functions.

SQL window functions can do plenty of other calculations. The examples below will focus only on the calculations I’ve already mentioned. And instead of describing every single possibility, I’ll review the most common uses and present them on different types of data.

To find out what window functions are, read a detailed explanation by the creator of LearnSQL.com’s course on window functions.

Moving on to the examples!

Ranking Data

One of the most common and easiest uses of the window functions is for ranking data. You can do this by using several different functions, such as RANK(), DENSE_RANK(), and ROW_NUMBER().

I’ll review ROW_NUMBER() in the following examples. No special reason; it’s just that I have yet to use ROW_NUMBER() in my writings about window functions, so it’s about time to change that!

Example 1: The top 15 Salespersons

If you work as a data analyst, you’ve probably had to create reports that contain, well, top lists—be it top salesperson, best-selling products, most visited websites, etc. It doesn’t matter; if there’s data, you can create a report.

You’ve probably seen such lists if you are an employee in a sales department, and your future in the company might depend on them. If you’re a manager of a sales department, I’m sure this is one of your favorite reports. Have you ever tried to create one yourself? It’s easy!

For this example, the data is stored in a table called salesperson. It consists of the following columns:

id: the ID of the salesperson
first_name: the first name of the salesperson
last_name: the last name of the salesperson
sales_2019: sales achieved during 2019

Write a little query like this one and you’ll get the result:

SELECT	ROW_NUMBER() OVER (ORDER BY sales_2019 DESC) AS 
            salesperson_rank,
		first_name,
		last_name,
		sales_2019
FROM salesperson;

What does this code do? It uses the ROW_NUMBER() function, which will assign a row number to every record in the table. With the OVER() clause, you define the window over which the operation will be performed. Omitting PARTITION BY() from the OVER() clause means I want the operation to be performed over the whole table. Ordering sales_2019 in descending order means I will get the best salesperson in the first row. The row number will be shown in the new column, which I have named salesperson_rank. The code then selects the remaining columns from the table salesperson.

There you have it, your top 15 salespeople:

salesperson_rank	first_name	last_name	sales_2019
1	Eachelle	Lound	573,170.97
2	Franklin	Loins	568,735.06
3	Land	Longhurst	564,691.01
4	Nedi	Barsham	555,217.57
5	Tina	Ludlow	538,225.30
6	Dorise	Gasking	519,220.00
7	Ellswerth	Divis	513,243.52
8	Ana	Golda	512,555.12
9	Lucie	Brewster	511,441.47
10	Curran	Daouze	504,939.45
11	Constanta	Khomishin	504,017.19
12	Bronson	Joburn	492,430.44
13	Yvonne	Playhill	489,094.94
14	Hortensia	Hartness	488,289.00
15	Phillip	Mulqueeny	484,875.87

If you feel shaky about the syntax, have our SQL window functions cheat sheet open and refer to it while going through the examples.

Example 2: The 15 Worst-Selling Products

For this example, let’s imagine you work as a product manager. You oversaw the creation and launch of thirty new products in the last 12 months for your company. Now you’re interested in seeing which products are not performing well, so you can develop strategies to increase their Amazon sales or replace them with another product.

The data about the new products are in the table new_products. It consists of the following columns:

id: the ID of the product
product_name: the name of the product
number_sold: quantity of the product sold

Of course, the code that will get you the desired report is similar to the one in the previous example. Here it is:

SELECT	ROW_NUMBER() OVER (ORDER BY number_sold ASC) AS worst_products,
		product_name,
		number_sold
FROM new_products;

I’ve again used the ROW_NUMBER() function. In the OVER() clause, I want the data to be ranked according to the number_sold column. It has to be ranked in ascending order, hence the ASC in the code. Row numbers will be shown in the column worst_products. The rest of the code selects the remaining columns from the table new_products.

Run the code, and you will see your 15 worst-performing products:

worst_products	product_name	number_sold
1	ChicoReal	4,567
2	WillowBook	4,587
3	Somoon	6,587
4	DaskaPeetal	7,821
5	Huisterdenkaart	8,564
6	OneZemalyac	12,284
7	Streechek	12,284
8	BarbieQue	14,562
9	Bleetwa	14,587
10	Leecymur	14,587
11	Yegulya	14,887
12	Egesmeder	18,357
13	Kuymuck	20,140
14	MrBasil	22,568
15	ZulufAlba	31,400
16	WishyWashy	48592
17	RobiKnotebook	55678
18	Dramalone	56897
19	FragolinoDiMonfalcone	66987
20	KerberQama	78521

You’ll probably notice that OneZemalyac and Streechek have the same number of products sold. The same is the case with products Bleetwa, Leecymur, and Yegulya. However, they don’t have the same ranking, because they are numbered sequentially. Products OneZemalyac and Streechek are ranked 6th and 7th, not 6th and 6th. That is, ranking using the ROW_NUMBER() function does not allow for ties.

Example 3: Ranking Inventory by Region

This example might be of interest to someone managing the inventory or working in supply chain management. You work for a company that has factories in four regions across the country. All the factories produce the same five products, and the goods produced are stored in the regional warehouses. The table inventory has the following information:

id: the ID of the product
product_name: the name of the product
quantity: the quantity of the product stored in the regional warehouses at the end of the year
region: the name of the region

You want to improve the product distribution to the customers, which should result in minimizing the quantity of the goods stored as inventory in the regional warehouses. You want to start by ranking the products within each region by their inventories in the warehouse. How would you get the desired report by using SQL window functions?

Here it is:

SELECT	ROW_NUMBER() OVER (PARTITION BY region ORDER BY quantity DESC) AS        
inventory_rank,
		product_name,
		quantity,
		region
FROM inventory;

As in the earlier example, the ROW_NUMBER() function gives you what you want. However, I am using the PARTITION BY() clause this time. With this clause, I define the partitions over which the operation (in this case, ranking) will be performed by specifying the column that will be used for the aggregation. If I omit the PARTITION BY() clause, the data would be ranked across the whole table, i.e. regardless of the region. Since I’m interested in seeing the data by region, I’ve chosen the column region by which to partition. The data is ordered by the quantity in descending order, and the rank will be shown in the column inventory_rank. The remainder of the code selects other columns from the table inventory.

Run the code, and you’ll get the report very quickly:

inventory_rank	product_name	quantity	region
1	POW872	10,000	East
2	RWU875	9,845	East
3	IOE935	7,894	East
4	KFUO24	6,894	East
5	HGX314	1,000	East
1	POW872	9,457	North
2	HGX314	8,524	North
3	RWU875	4,825	North
4	IOE935	1,578	North
5	KFUO24	75	North
1	KFUO24	14,587	South
2	RWU875	12,845	South
3	POW872	7,542	South
4	HGX314	754	South
5	IOE935	82	South
1	HGX314	12,587	West
2	KFUO24	12,300	West
3	RWU875	4,852	West
4	POW872	4,489	West
5	IOE935	518	West

If you want to learn more about this topic, here’s an article about ranking data using window functions.

Analyzing Trends Over Time

SQL window functions truly show their power in analyzing trends over time. You can calculate the differences between the previous and the current periods, get the percentage increase or decrease compared to the previous periods, or calculate the moving averages. This is often used by brokers, fund managers, or any kind of financial experts who monitor historical data and build or use forecasting models. What you read about COVID-19 every day—the daily numbers of people infected, recovered, or deceased, the estimates of the future development of the pandemic, etc.—is based on analyzing trends over time. Epidemiologists and public health experts do this daily. Take any manager in any company in the world, and their decisions are based on analyzing historical data.

Let’s start by calculating the differences between periods.

If you are a fund manager, or any kind of investor, you’ll likely be interested in this analysis quite often. If you work as a financial analyst, you’ve probably done this analysis quite frequently. It doesn’t have to be the share price; it can be any data for which you want to compare the current and the previous periods. It doesn’t matter whether it is about daily, weekly, monthly, quarterly, or yearly changes; SQL window functions work the same way.

In this example, you have daily prices of one share. You want to calculate the differences day over day, so you can build a model and forecast future price changes. All the data you need is in the table share which contains the following columns:

ticker: the ticker symbol of the share, i.e. the short name under which it is being traded
company: the name of the company that issued the share
date: the date traded
price: the price of the share

First, let’s think about the logic. It’ll make writing the code easier.

What you want to do here is to take the price from the previous day and subtract that from the price of the current day. And you need to do that for every record you have in the table.

Now, let’s try to translate this logic into code:

SELECT	ticker,
		company,
		date,
		price,
		LAG(price) OVER (ORDER BY date) AS previous_day_price
FROM share;

Nothing strange in the first part of the code. I’ve simply selected all the columns from the table share.

Now comes the fun part! There’s something called LAG(). It allows you to go back a certain number of rows and have data from that row be shown in the current row. You can go back any number of rows. The default value is 1, which is why LAG(price) goes back just one row. Even though I didn’t specify the number of rows as LAG (price, 1), it does exactly what I need.

The remainder of the code is like the window functions you’ve already seen. There’s an OVER() clause, and I want the operation to be performed by date. The data will be shown in the column previous_day_price.

Run the code to get the table below:

ticker	company	date	price	previous_day_price
PTA	Panthelya Inc.	2020-06-01	45.32	NULL
PTA	Panthelya Inc.	2020-06-02	46.38	45.32
PTA	Panthelya Inc.	2020-06-03	47.12	46.38
PTA	Panthelya Inc.	2020-06-04	47.12	47.12
PTA	Panthelya Inc.	2020-06-05	52.32	47.12

However, this is not the analysis you wanted. You’re not interested in just seeing the previous day's price, are you? What you want is to calculate the difference between the current price and the price of the day before. How would you do this in one step, now that you know what the LAG() function can do? Yes, it’s this simple:

SELECT	ticker,
		company,
		date,
		price,
		(price - LAG(price) OVER (ORDER BY date)) AS daily_change
FROM share;

The code is still pretty much the same! The only difference is that the LAG() function is subtracted from the price. Yes, it does mean what you think it means! I subtracted the previous day’s price from the current price, with the result to be shown in the column daily_change.

Finally, here’s the table you want to see:

ticker	company	date	price	daily_change
PTA	Panthelya Inc.	2020-06-01	45.32	NULL
PTA	Panthelya Inc.	2020-06-02	46.38	1.06
PTA	Panthelya Inc.	2020-06-03	47.12	0.74
PTA	Panthelya Inc.	2020-06-04	47.12	0
PTA	Panthelya Inc.	2020-06-05	52.32	5.2
PTA	Panthelya Inc.	2020-06-06	58.18	5.86
PTA	Panthelya Inc.	2020-06-07	59	0.82
PTA	Panthelya Inc.	2020-06-08	62.54	3.54
PTA	Panthelya Inc.	2020-06-09	58.64	-3.9
PTA	Panthelya Inc.	2020-06-10	60.08	1.44
PTA	Panthelya Inc.	2020-06-11	69.84	9.76
PTA	Panthelya Inc.	2020-06-12	43.22	-26.62
PTA	Panthelya Inc.	2020-06-13	52.22	9
PTA	Panthelya Inc.	2020-06-14	77.54	25.32
PTA	Panthelya Inc.	2020-06-15	94.21	16.67
PTA	Panthelya Inc.	2020-06-16	92.84	-1.37
PTA	Panthelya Inc.	2020-06-17	92.75	-0.09
PTA	Panthelya Inc.	2020-06-18	93	0.25
PTA	Panthelya Inc.	2020-06-19	92.84	-0.16
PTA	Panthelya Inc.	2020-06-20	94.45	1.61
PTA	Panthelya Inc.	2020-06-21	94.49	0.04
PTA	Panthelya Inc.	2020-06-22	94.21	-0.28
PTA	Panthelya Inc.	2020-06-23	98.18	3.97
PTA	Panthelya Inc.	2020-06-24	92.27	-5.91
PTA	Panthelya Inc.	2020-06-25	97.84	5.57
PTA	Panthelya Inc.	2020-06-26	42.56	-55.28
PTA	Panthelya Inc.	2020-06-27	32.54	-10.02
PTA	Panthelya Inc.	2020-06-28	28.63	-3.91
PTA	Panthelya Inc.	2020-06-29	30.24	1.61
PTA	Panthelya Inc.	2020-06-30	38.64	8.4

The first row is NULL, because the first day of the month does not have any previous value to subtract from it.

Example 5: Calculating Daily Percent Changes of new COVID-19 Cases

Epidemiologists and public health experts are under the spotlight these days with a lot of pressure. Their job is not easy right now, but it can be made a little bit easier by SQL window functions.

Imagine there’s a pandemic going on in the world. OK, we don’t have to imagine that. You have daily data for new COVID-19 cases in an imaginary country. For our amusement, let’s call it Covidlandia. You need to analyze the data and calculate the daily percentage change in new cases. The data in the table covid_19_new_cases, of course, is completely made up. There are three columns:

country: the country of the new cases
date: the date of the new cases
new_cases: the number of new cases

As in the previous example, let’s talk about logic and mathematics first. How would you get the desired result without SQL? For example, there are 78 new cases today, and there were 54 new cases yesterday. You should subtract yesterday’s number from today’s number, then divide the difference by yesterday’s number. To get the percentage, multiply the result by 100. In other words:

(78-54)/54*100 = 44.44%

Now, let’s translate this into SQL code:

SELECT	country,
		date,
		new_cases,
		(new_cases - LAG(new_cases) OVER (ORDER BY date))/LAG(new_cases) OVER (ORDER BY date)*100 AS daily_percent_change
FROM covid_19_new_cases;

Since you already understand the logic of the LAG() function from the previous example, I won’t break the code into detailed steps. In the first part, you can see I’ve selected the country, date, and new_cases columns from the table covid_19_new_cases.

Now comes the part that might look scary. But it’s not; it’s nearly the same code as in the previous example. Let’s analyze it! First, I subtract the number of cases of the previous day from today’s number. This is exactly what the part new_cases - LAG(new_cases) OVER (ORDER BY date) does. Then I divide the result by the number of cases of the previous day, which is: LAG(new_cases) OVER (ORDER BY date). I multiply the result by 100 to get the percentage, which is shown in the column daily_percent_change.

The result can be shown as a table:

country	date	new_cases	daily_percent_change
Covidlandia	2020-06-01	12	NULL
Covidlandia	2020-06-02	18	50.00
Covidlandia	2020-06-03	17	-5.56
Covidlandia	2020-06-04	25	47.06
Covidlandia	2020-06-05	32	28.00
Covidlandia	2020-06-06	38	18.75
Covidlandia	2020-06-07	40	5.26
Covidlandia	2020-06-08	45	12.50
Covidlandia	2020-06-09	57	26.67
Covidlandia	2020-06-10	112	96.49
Covidlandia	2020-06-11	158	41.07
Covidlandia	2020-06-12	158	0.00
Covidlandia	2020-06-13	174	10.13
Covidlandia	2020-06-14	184	5.75
Covidlandia	2020-06-15	190	3.26
Covidlandia	2020-06-16	187	-1.58
Covidlandia	2020-06-17	184	-1.60
Covidlandia	2020-06-18	204	10.87
Covidlandia	2020-06-19	208	1.96
Covidlandia	2020-06-20	208	0.00
Covidlandia	2020-06-21	212	1.92
Covidlandia	2020-06-22	248	16.98
Covidlandia	2020-06-23	357	43.95
Covidlandia	2020-06-24	419	17.37
Covidlandia	2020-06-25	416	-0.72
Covidlandia	2020-06-26	403	-3.13
Covidlandia	2020-06-27	400	-0.74
Covidlandia	2020-06-28	396	-1.00
Covidlandia	2020-06-29	396	0.00
Covidlandia	2020-06-30	347	-12.37

Note that the first value is again NULL. The reason is the same as it was in the previous example.

Now that you’ve learned this calculation on a made-up data set, try analyzing real COVID-19 data. My colleague’s article gives you very detailed guidance on how to do that as well as show some other uses of window functions.

Example 6: Calculating Moving Averages of Monthly Site Visits

This time, you work in an e-commerce company with three sites. You’re the manager at the company, and you’ve asked your analyst to prepare a report showing average monthly visits for each site owned by your company. Your analyst is, of course, very experienced. He or she knows that calculating average monthly visits is straightforward but would not smooth out the volatility and the seasonality of the site visits. This is a reason to get a report with moving averages, which consider the current month and the two previous months.

In case you’re not familiar with moving averages, here is a simple example. We have monthly data for hotel overnight stays, their overall average (i.e. the arithmetic mean), and the moving averages. The hospitality industry can be very seasonal, so calculating the arithmetic mean can give you a very distorted picture. You can see that the average of monthly overnight stays is 2165.83. The moving averages range from 541.33 to 3832.00, which are far more realistic. Here’s the table to see for yourself:

month	overnight_stays	average	moving_average
01/2019	3,582	2,165.83	3,582.00
02/2019	1,802	2,165.83	2,692.00
03/2019	687	2,165.83	2,023.67
04/2019	248	2,165.83	912.33
05/2019	689	2,165.83	541.33
06/2019	2,250	2,165.83	1,062.33
07/2019	3,012	2,165.83	1,983.67
08/2019	5,897	2,165.83	3,719.67
09/2019	2,587	2,165.83	3,832.00
10/2019	482	2,165.83	2,988.67
11/2019	234	2,165.83	1,101.00
12/2019	4,520	2,165.83	1,745.33

To visualize the difference between the overall average and the moving average, take a look at the chart below:

We now move on to calculating moving averages using SQL window functions.

Back to our monthly site visits. The table site_visit consists of the following columns:

id: the ID of the visit
site: the name of the site
month: the month of the visits
number_of_visits: the number of the site visits

The code below is what you need:

SELECT	id,
		site,
		month,
		number_of_visits,
		AVG (number_of_visits) OVER (PARTITION BY site ORDER BY month ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING) AS moving_average_visits
FROM site_visit;

Let me explain what I am doing here. First, I select all the columns from the table site_visit. Then comes the interesting part! I use the SQL window function AVG (number_of_visits), since I want the average of the site visits. Then comes the OVER() clause, as always. Since I want averages separately for each site rather than for all three sites together, I use PARTITION BY site. This means I am aggregating data at the site level.

The operation needs to be performed sequentially by month and not in some random order. To ensure this, there is ORDER BY month. I want to calculate the moving averages over the current month and the two previous months to smooth out the volatility. This is defined by ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING. This takes three rows into account when calculating the moving averages: the current row and the two rows preceding it.

Depending on how you want to calculate the moving averages, you can increase or decrease the number of preceding and following rows. For example, if you write ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING, this means you’re taking a total of six rows to calculate the moving average: the current row, the three rows preceding it, and the two rows following it.

Here’s my report:

id	site	month	number_of_visits	moving_average_visits
1	E-commerceSuperSite	01/2019	45,789,465	45,789,465.00
2	E-commerceSuperSite	02/2019	45,852,429	45,820,947.00
3	E-commerceSuperSite	03/2019	45,857,465	45,833,119.67
4	E-commerceSuperSite	04/2019	45,987,452	45,899,115.33
5	E-commerceSuperSite	05/2019	46,124,756	45,989,891.00
6	E-commerceSuperSite	06/2019	46,125,746	46,079,318.00
7	E-commerceSuperSite	07/2019	46,124,756	46,125,086.00
8	E-commerceSuperSite	08/2019	46,114,784	46,121,762.00
9	E-commerceSuperSite	09/2019	46,125,411	46,121,650.33
10	E-commerceSuperSite	10/2019	46,125,784	46,121,993.00
11	E-commerceSuperSite	11/2019	46,178,421	46,143,205.33
12	E-commerceSuperSite	12/2019	46,170,254	46,158,153.00
1	GreatSite4U	01/2019	56,789	56,789.00
2	GreatSite4U	02/2019	74,564	65,676.50
3	GreatSite4U	03/2019	85,426	72,259.67
4	GreatSite4U	04/2019	72,547	77,512.33
5	GreatSite4U	05/2019	75,000	77,657.67
6	GreatSite4U	06/2019	92,546	80,031.00
7	GreatSite4U	07/2019	89,546	85,697.33
8	GreatSite4U	08/2019	87,237	89,776.33
9	GreatSite4U	09/2019	87,412	88,065.00
10	GreatSite4U	10/2019	76,398	83,682.33
11	GreatSite4U	11/2019	69,874	77,894.67
12	GreatSite4U	12/2019	84,417	76,896.33
1	PleaseVisit	01/2019	897	897.00
2	PleaseVisit	02/2019	658	777.50
3	PleaseVisit	03/2019	2,587	1,380.67
4	PleaseVisit	04/2019	6,845	3,363.33
5	PleaseVisit	05/2019	10,254	6,562.00
6	PleaseVisit	06/2019	11,487	9,528.67
7	PleaseVisit	07/2019	13,345	11,695.33
8	PleaseVisit	08/2019	14,897	13,243.00
9	PleaseVisit	09/2019	15,497	14,579.67
10	PleaseVisit	10/2019	18,845	16,413.00
11	PleaseVisit	11/2019	28,467	20,936.33
12	PleaseVisit	12/2019	84,417	43,909.67

Let’s analyze the result a bit to understand how the calculation works.

The first moving average is the same as the number of visits. This is expected, because there is no data before this row. SQL takes the total number of visits, divides it by the number of records (which is one, in this case), and returns the result that equals the number of visits for the month.

id	site	month	number_of_visits	moving_average_visits
1	E-commerceSuperSite	01/2019	45,789,465	45,789,465.00

Did you expect that the moving average in the second row would be equal to the number of visits for the month since there’s not enough history? If you did, then you would be wrong! This moving average sums the current row and the row before and divides the result by the number of rows. It is divided by two, even though I specify I want three rows considered. Let’s check the result:

(45,789,465 + 45,852,429)/2 = 45,820,947.00

It is correct!

id	site	month	number_of_visits	moving_average_visits
1	E-commerceSuperSite	01/2019	45,789,465	45,789,465.00
2	E-commerceSuperSite	02/2019	45,852,429	45,820,947.00

The third row is finally doing what I want: take the current row, take the two rows before it, and return the average. Let’s check:

(45,789,465 + 45,852,429 + 45,857,465)/3 = 45,833,119.67

Correct again! No need to check anything else; maybe we should just start trusting SQL!

id	site	month	number_of_visits	moving_average_visits
1	E-commerceSuperSite	01/2019	45,789,465	45,789,465.00
2	E-commerceSuperSite	02/2019	45,852,429	45,820,947.00
3	E-commerceSuperSite	03/2019	45,857,465	45,833,119.67

Calculating Running Totals

Running totals are found in different kinds of analysis. They can be very helpful in financial analysis, for example, and SQL window functions provide you with the tools to calculate them easily.

Running totals are also called cumulative sums, since they add the current values to the total of all previous values. This article about running totals has an approachable explanation, with several examples of how it is used. It doesn’t mean I won’t show you some examples too!

Example 7: Calculating Running Totals of the Debt Collected by Your Call Center

It seems every company has a call center today. Banks, other financial institutions, debt collection agencies, telecom companies, you name it. One purpose of the call center is to remind the customer of the unpaid bills. You’re monitoring the efficiency of the call center and want to analyze the amount collected after the call center contacts the customer.

The table is debt_collected, and its columns are as follows:

id: the ID of the debt collected
month: the month when the debt was collected
amount: the amount of the debt collected

To get the running totals, you need this code:

SELECT	id,
		month,
		amount,
		SUM(amount) OVER (ORDER BY month) AS debt_collected_rt
FROM debt_collected;

First, I select the columns from the table debt_collected. Then I need to calculate the running totals. To do that, I need SUM() used as a window function with the column to be totaled specified in the parenthesis. Then comes the OVER() clause; PARTITION BY() is omitted because I want the running total of all the data available. The operation will be performed sequentially, from January to December, and not in some random order; hence the data is ordered by month. The result will appear in the column debt_collected_rt.

The result of the query looks like this:

id	month	amount	debt_collected_rt
1	01/2019	575,457.28	575,457.28
2	02/2019	578,200.85	1,153,658.13
3	03/2019	567,257.77	1,720,915.90
4	04/2019	657,452.12	2,378,368.02
5	05/2019	622,157.42	3,000,525.44
6	06/2019	608,745.47	3,609,270.91
7	07/2019	594,122.33	4,203,393.24
8	08/2019	591,114.49	4,794,507.73
9	09/2019	541,258.68	5,335,766.41
10	10/2019	584,127.11	5,919,893.52
11	11/2019	587,774.43	6,507,667.95
12	12/2019	596,471.87	7,104,139.82

Example 8: Calculating Running Totals of the Employee Costs

Let’s practice a little more with a similar example. What if you’re an HR manager and need to see the monthly employee costs, together with the running totals? For instance, if your task is to plan the budget for the next year, you need some historical data to help you. The previous year’s budget is always a good starting point.

The data can be found in the table employee_costs, and its columns are:

id: the ID of the employee costs
month: the month of the employee costs
amount: the amount of the employee costs

The code is practically the same as in the previous example:

SELECT	id,
		month,
		amount,
		SUM(amount) OVER (ORDER BY month) AS costs_rt
FROM employee_costs;

I’ve selected all the columns from the table employee_costs. The window function sums the amount month by month and puts the result in the new column costs_rt.

id	month	amount	costs_rt
1	01/2019	84,992.57	84,992.57
2	02/2019	87,562.24	172,554.81
3	03/2019	86,451.82	259,006.63
4	04/2019	86,451.82	345,458.45
5	05/2019	85,456.13	430,914.58
6	06/2019	86,782.45	517,697.03
7	07/2019	88,253.45	605,950.48
8	08/2019	88,795.64	694,746.12
9	09/2019	89,974.34	784,720.46
10	10/2019	92,444.44	877,164.90
11	11/2019	93,012.55	970,177.45
12	12/2019	93,999.14	1,064,176.59

Example 9: Calculating Running Totals of Quarterly Sales

If you’re a financial analyst, a sales manager, a regional manager, or any kind of manager, you’ve probably seen reports like this. In this scenario, you work for a company with five geographical regions. You have data on quarterly sales by region. What you want is a report showing the running total of the sales separately for each region.

All the data you need is in the table regional_sales. The columns are:

id: the ID of the sales
region: the name of the region
quarter: the quarter of the sales
amount: the amount of the sales

Here’s a simple code that allows you to see the required results:

SELECT	id,
		region,
		quarter,
		amount,
		SUM (amount) OVER (PARTITION BY region ORDER BY quarter) AS regional_sales_rt
FROM regional_sales;

As always, I select all columns from the table. Then comes the window function part. I sum up the amount by using the SUM() function. The window is defined by OVER(). There is a PARTITION BY clause this time, because I want to see the data by region. The data needs to be summed up in order by quarter, so I order the data by quarter. Finally, the resulting data will be shown in the column called regional_sales_rt.

id	region	quarter	amount	regional_sales_rt
1	Central Europe	1Q2019	7,854,127.32	7,854,127.32
2	Central Europe	2Q2019	7,782,112.23	15,636,239.55
3	Central Europe	3Q2019	7,612,556.88	23,248,796.43
4	Central Europe	4Q2019	8,023,448.77	31,272,245.20
5	Eastern Europe	1Q2019	5,412,444.62	5,412,444.62
6	Eastern Europe	2Q2019	5,208,412.37	10,620,856.99
7	Eastern Europe	3Q2019	5,132,445.58	15,753,302.57
8	Eastern Europe	4Q2019	5,800,613.22	21,553,915.79
13	Northern Europe	1Q2019	3,541,222.14	3,541,222.14
14	Northern Europe	2Q2019	3,247,772.67	6,788,994.81
15	Northern Europe	3Q2019	3,456,773.29	10,245,768.10
16	Northern Europe	4Q2019	3,320,520.84	13,566,288.94
17	Southern Europe	1Q2019	1,482,222.66	1,482,222.66
18	Southern Europe	2Q2019	1,628,741.56	3,110,964.22
19	Southern Europe	3Q2019	2,208,456.03	5,319,420.25
20	Southern Europe	4Q2019	2,485,212.33	7,804,632.58
9	Western Europe	1Q2019	11,285,774.26	11,285,774.26
10	Western Europe	2Q2019	11,487,662.29	22,773,436.55
11	Western Europe	3Q2019	12,564,442.83	35,337,879.38
12	Western Europe	4Q2019	11,662,451.18	47,000,330.56

Do you Want to Learn SQL Window Functions?

By giving you nine examples, I’ve tried to show you various scenarios in which you could find SQL window functions helpful. What you have learned about window functions here should make it easier for you to start learning through our Window Functions course.

The examples in this article do not cover everything window functions can do. Instead of smothering you with all their possibilities, we presented only a handful of uses. The point was to show you how one function could be helpful for several different job descriptions. As you have seen, learning about window functions is wise for every manager and data analyst.

Of course, the usefulness of the window functions doesn’t stop here. Experts working in sales, retail, inventory, e-commerce, product and supply chain management, finance, and public health, among others, will find learning about window functions makes their job easier. I hope I’ve managed to find at least one example to which you could relate in your everyday tasks. If I have not, however, it doesn’t mean SQL window functions can’t be useful to you. It just reflects the wide variety of scenarios in which window functions could be used; it is impossible to cover with just a handful of examples. If you think a little bit about your day-to-day tasks, I’m sure you will find situations in which you could apply window functions.

I’d like to hear from you in the comments section. Feel free to share your experience with SQL window functions and how you use them.

Tags: