5th Aug 2020 20 minutes read Who Should Learn SQL Window Functions? Tihomir Babic sql learn sql window functions Table of Contents Ranking Data Example 1: The top 15 Salespersons Example 2: The 15 Worst-Selling Products Example 3: Ranking Inventory by Region Analyzing Trends Over Time Example 4: Calculating Daily Changes of a Share Price Example 5: Calculating Daily Percent Changes of new COVID-19 Cases Example 6: Calculating Moving Averages of Monthly Site Visits Calculating Running Totals Example 7: Calculating Running Totals of the Debt Collected by Your Call Center Example 8: Calculating Running Totals of the Employee Costs Example 9: Calculating Running Totals of Quarterly Sales Do you Want to Learn SQL Window Functions? Do you want to learn how SQL window functions can help you at your job? This article will show you examples from various business applications where they can be very useful. I won’t be explaining what SQL window functions are in this article, but rather how to use them. If you’re not familiar with window functions or their syntax, don’t worry. Here’s an article that can help you with an introduction to SQL window functions. SQL window functions are very helpful when creating any kind of report. They are typically used in ranking data and calculating running totals. They’re especially useful with time-series data, such as calculating the differences between the previous and current periods. You can calculate moving averages or even the percentage changes compared to the previous periods. This is applicable to a wide range of professions: from financial experts and analysts to those working in sales, retail, e-commerce, supply chain management, and product management, among others. Public health experts and epidemiologists analyze data using SQL window functions. All decision makers, from team leaders to top management, could benefit from understanding window functions. SQL window functions can do plenty of other calculations. The examples below will focus only on the calculations I’ve already mentioned. And instead of describing every single possibility, I’ll review the most common uses and present them on different types of data. To find out what window functions are, read a detailed explanation by the creator of LearnSQL.com’s course on window functions. Moving on to the examples! Ranking Data One of the most common and easiest uses of the window functions is for ranking data. You can do this by using several different functions, such as RANK(), DENSE_RANK(), and ROW_NUMBER(). I’ll review ROW_NUMBER() in the following examples. No special reason; it’s just that I have yet to use ROW_NUMBER() in my writings about window functions, so it’s about time to change that! Example 1: The top 15 Salespersons If you work as a data analyst, you’ve probably had to create reports that contain, well, top lists—be it top salesperson, best-selling products, most visited websites, etc. It doesn’t matter; if there’s data, you can create a report. You’ve probably seen such lists if you are an employee in a sales department, and your future in the company might depend on them. If you’re a manager of a sales department, I’m sure this is one of your favorite reports. Have you ever tried to create one yourself? It’s easy! For this example, the data is stored in a table called salesperson. It consists of the following columns: id: the ID of the salesperson first_name: the first name of the salesperson last_name: the last name of the salesperson sales_2019: sales achieved during 2019 Write a little query like this one and you’ll get the result: SELECT ROW_NUMBER() OVER (ORDER BY sales_2019 DESC) AS salesperson_rank, first_name, last_name, sales_2019 FROM salesperson; What does this code do? It uses the ROW_NUMBER() function, which will assign a row number to every record in the table. With the OVER() clause, you define the window over which the operation will be performed. Omitting PARTITION BY() from the OVER() clause means I want the operation to be performed over the whole table. Ordering sales_2019 in descending order means I will get the best salesperson in the first row. The row number will be shown in the new column, which I have named salesperson_rank. The code then selects the remaining columns from the table salesperson. There you have it, your top 15 salespeople: salesperson_rankfirst_namelast_namesales_2019 1EachelleLound573,170.97 2FranklinLoins568,735.06 3LandLonghurst564,691.01 4NediBarsham555,217.57 5TinaLudlow538,225.30 6DoriseGasking519,220.00 7EllswerthDivis513,243.52 8AnaGolda512,555.12 9LucieBrewster511,441.47 10CurranDaouze504,939.45 11ConstantaKhomishin504,017.19 12BronsonJoburn492,430.44 13YvonnePlayhill489,094.94 14HortensiaHartness488,289.00 15PhillipMulqueeny484,875.87 If you feel shaky about the syntax, have our SQL window functions cheat sheet open and refer to it while going through the examples. Example 2: The 15 Worst-Selling Products For this example, let’s imagine you work as a product manager. You oversaw the creation and launch of thirty new products in the last 12 months for your company. Now you’re interested in seeing which products are not performing well, so you can develop strategies to increase their Amazon sales or replace them with another product. The data about the new products are in the table new_products. It consists of the following columns: id: the ID of the product product_name: the name of the product number_sold: quantity of the product sold Of course, the code that will get you the desired report is similar to the one in the previous example. Here it is: SELECT ROW_NUMBER() OVER (ORDER BY number_sold ASC) AS worst_products, product_name, number_sold FROM new_products; I’ve again used the ROW_NUMBER() function. In the OVER() clause, I want the data to be ranked according to the number_sold column. It has to be ranked in ascending order, hence the ASC in the code. Row numbers will be shown in the column worst_products. The rest of the code selects the remaining columns from the table new_products. Run the code, and you will see your 15 worst-performing products: worst_productsproduct_namenumber_sold 1ChicoReal4,567 2WillowBook4,587 3Somoon6,587 4DaskaPeetal7,821 5Huisterdenkaart8,564 6OneZemalyac12,284 7Streechek12,284 8BarbieQue14,562 9Bleetwa14,587 10Leecymur14,587 11Yegulya14,887 12Egesmeder18,357 13Kuymuck20,140 14MrBasil22,568 15ZulufAlba31,400 16WishyWashy48592 17RobiKnotebook55678 18Dramalone56897 19FragolinoDiMonfalcone66987 20KerberQama78521 You’ll probably notice that OneZemalyac and Streechek have the same number of products sold. The same is the case with products Bleetwa, Leecymur, and Yegulya. However, they don’t have the same ranking, because they are numbered sequentially. Products OneZemalyac and Streechek are ranked 6th and 7th, not 6th and 6th. That is, ranking using the ROW_NUMBER() function does not allow for ties. Example 3: Ranking Inventory by Region This example might be of interest to someone managing the inventory or working in supply chain management. You work for a company that has factories in four regions across the country. All the factories produce the same five products, and the goods produced are stored in the regional warehouses. The table inventory has the following information: id: the ID of the product product_name: the name of the product quantity: the quantity of the product stored in the regional warehouses at the end of the year region: the name of the region You want to improve the product distribution to the customers, which should result in minimizing the quantity of the goods stored as inventory in the regional warehouses. You want to start by ranking the products within each region by their inventories in the warehouse. How would you get the desired report by using SQL window functions? Here it is: SELECT ROW_NUMBER() OVER (PARTITION BY region ORDER BY quantity DESC) AS inventory_rank, product_name, quantity, region FROM inventory; As in the earlier example, the ROW_NUMBER() function gives you what you want. However, I am using the PARTITION BY() clause this time. With this clause, I define the partitions over which the operation (in this case, ranking) will be performed by specifying the column that will be used for the aggregation. If I omit the PARTITION BY() clause, the data would be ranked across the whole table, i.e. regardless of the region. Since I’m interested in seeing the data by region, I’ve chosen the column region by which to partition. The data is ordered by the quantity in descending order, and the rank will be shown in the column inventory_rank. The remainder of the code selects other columns from the table inventory. Run the code, and you’ll get the report very quickly: inventory_rankproduct_namequantityregion 1POW87210,000East 2RWU8759,845East 3IOE9357,894East 4KFUO246,894East 5HGX3141,000East 1POW8729,457North 2HGX3148,524North 3RWU8754,825North 4IOE9351,578North 5KFUO2475North 1KFUO2414,587South 2RWU87512,845South 3POW8727,542South 4HGX314754South 5IOE93582South 1HGX31412,587West 2KFUO2412,300West 3RWU8754,852West 4POW8724,489West 5IOE935518West If you want to learn more about this topic, here’s an article about ranking data using window functions. Analyzing Trends Over Time SQL window functions truly show their power in analyzing trends over time. You can calculate the differences between the previous and the current periods, get the percentage increase or decrease compared to the previous periods, or calculate the moving averages. This is often used by brokers, fund managers, or any kind of financial experts who monitor historical data and build or use forecasting models. What you read about COVID-19 every day—the daily numbers of people infected, recovered, or deceased, the estimates of the future development of the pandemic, etc.—is based on analyzing trends over time. Epidemiologists and public health experts do this daily. Take any manager in any company in the world, and their decisions are based on analyzing historical data. Let’s start by calculating the differences between periods. Example 4: Calculating Daily Changes of a Share Price If you are a fund manager, or any kind of investor, you’ll likely be interested in this analysis quite often. If you work as a financial analyst, you’ve probably done this analysis quite frequently. It doesn’t have to be the share price; it can be any data for which you want to compare the current and the previous periods. It doesn’t matter whether it is about daily, weekly, monthly, quarterly, or yearly changes; SQL window functions work the same way. In this example, you have daily prices of one share. You want to calculate the differences day over day, so you can build a model and forecast future price changes. All the data you need is in the table share which contains the following columns: ticker: the ticker symbol of the share, i.e. the short name under which it is being traded company: the name of the company that issued the share date: the date traded price: the price of the share First, let’s think about the logic. It’ll make writing the code easier. What you want to do here is to take the price from the previous day and subtract that from the price of the current day. And you need to do that for every record you have in the table. Now, let’s try to translate this logic into code: SELECT ticker, company, date, price, LAG(price) OVER (ORDER BY date) AS previous_day_price FROM share; Nothing strange in the first part of the code. I’ve simply selected all the columns from the table share. Now comes the fun part! There’s something called LAG(). It allows you to go back a certain number of rows and have data from that row be shown in the current row. You can go back any number of rows. The default value is 1, which is why LAG(price) goes back just one row. Even though I didn’t specify the number of rows as LAG (price, 1), it does exactly what I need. The remainder of the code is like the window functions you’ve already seen. There’s an OVER() clause, and I want the operation to be performed by date. The data will be shown in the column previous_day_price. Run the code to get the table below: tickercompanydatepriceprevious_day_price PTAPanthelya Inc.2020-06-0145.32NULL PTAPanthelya Inc.2020-06-0246.3845.32 PTAPanthelya Inc.2020-06-0347.1246.38 PTAPanthelya Inc.2020-06-0447.1247.12 PTAPanthelya Inc.2020-06-0552.3247.12 However, this is not the analysis you wanted. You’re not interested in just seeing the previous day's price, are you? What you want is to calculate the difference between the current price and the price of the day before. How would you do this in one step, now that you know what the LAG() function can do? Yes, it’s this simple: SELECT ticker, company, date, price, (price - LAG(price) OVER (ORDER BY date)) AS daily_change FROM share; The code is still pretty much the same! The only difference is that the LAG() function is subtracted from the price. Yes, it does mean what you think it means! I subtracted the previous day’s price from the current price, with the result to be shown in the column daily_change. Finally, here’s the table you want to see: tickercompanydatepricedaily_change PTAPanthelya Inc.2020-06-0145.32NULL PTAPanthelya Inc.2020-06-0246.381.06 PTAPanthelya Inc.2020-06-0347.120.74 PTAPanthelya Inc.2020-06-0447.120 PTAPanthelya Inc.2020-06-0552.325.2 PTAPanthelya Inc.2020-06-0658.185.86 PTAPanthelya Inc.2020-06-07590.82 PTAPanthelya Inc.2020-06-0862.543.54 PTAPanthelya Inc.2020-06-0958.64-3.9 PTAPanthelya Inc.2020-06-1060.081.44 PTAPanthelya Inc.2020-06-1169.849.76 PTAPanthelya Inc.2020-06-1243.22-26.62 PTAPanthelya Inc.2020-06-1352.229 PTAPanthelya Inc.2020-06-1477.5425.32 PTAPanthelya Inc.2020-06-1594.2116.67 PTAPanthelya Inc.2020-06-1692.84-1.37 PTAPanthelya Inc.2020-06-1792.75-0.09 PTAPanthelya Inc.2020-06-18930.25 PTAPanthelya Inc.2020-06-1992.84-0.16 PTAPanthelya Inc.2020-06-2094.451.61 PTAPanthelya Inc.2020-06-2194.490.04 PTAPanthelya Inc.2020-06-2294.21-0.28 PTAPanthelya Inc.2020-06-2398.183.97 PTAPanthelya Inc.2020-06-2492.27-5.91 PTAPanthelya Inc.2020-06-2597.845.57 PTAPanthelya Inc.2020-06-2642.56-55.28 PTAPanthelya Inc.2020-06-2732.54-10.02 PTAPanthelya Inc.2020-06-2828.63-3.91 PTAPanthelya Inc.2020-06-2930.241.61 PTAPanthelya Inc.2020-06-3038.648.4 The first row is NULL, because the first day of the month does not have any previous value to subtract from it. Example 5: Calculating Daily Percent Changes of new COVID-19 Cases Epidemiologists and public health experts are under the spotlight these days with a lot of pressure. Their job is not easy right now, but it can be made a little bit easier by SQL window functions. Imagine there’s a pandemic going on in the world. OK, we don’t have to imagine that. You have daily data for new COVID-19 cases in an imaginary country. For our amusement, let’s call it Covidlandia. You need to analyze the data and calculate the daily percentage change in new cases. The data in the table covid_19_new_cases, of course, is completely made up. There are three columns: country: the country of the new cases date: the date of the new cases new_cases: the number of new cases As in the previous example, let’s talk about logic and mathematics first. How would you get the desired result without SQL? For example, there are 78 new cases today, and there were 54 new cases yesterday. You should subtract yesterday’s number from today’s number, then divide the difference by yesterday’s number. To get the percentage, multiply the result by 100. In other words: (78-54)/54*100 = 44.44% Now, let’s translate this into SQL code: SELECT country, date, new_cases, (new_cases - LAG(new_cases) OVER (ORDER BY date))/LAG(new_cases) OVER (ORDER BY date)*100 AS daily_percent_change FROM covid_19_new_cases; Since you already understand the logic of the LAG() function from the previous example, I won’t break the code into detailed steps. In the first part, you can see I’ve selected the country, date, and new_cases columns from the table covid_19_new_cases. Now comes the part that might look scary. But it’s not; it’s nearly the same code as in the previous example. Let’s analyze it! First, I subtract the number of cases of the previous day from today’s number. This is exactly what the part new_cases - LAG(new_cases) OVER (ORDER BY date) does. Then I divide the result by the number of cases of the previous day, which is: LAG(new_cases) OVER (ORDER BY date). I multiply the result by 100 to get the percentage, which is shown in the column daily_percent_change. The result can be shown as a table: countrydatenew_casesdaily_percent_change Covidlandia2020-06-0112NULL Covidlandia2020-06-021850.00 Covidlandia2020-06-0317-5.56 Covidlandia2020-06-042547.06 Covidlandia2020-06-053228.00 Covidlandia2020-06-063818.75 Covidlandia2020-06-07405.26 Covidlandia2020-06-084512.50 Covidlandia2020-06-095726.67 Covidlandia2020-06-1011296.49 Covidlandia2020-06-1115841.07 Covidlandia2020-06-121580.00 Covidlandia2020-06-1317410.13 Covidlandia2020-06-141845.75 Covidlandia2020-06-151903.26 Covidlandia2020-06-16187-1.58 Covidlandia2020-06-17184-1.60 Covidlandia2020-06-1820410.87 Covidlandia2020-06-192081.96 Covidlandia2020-06-202080.00 Covidlandia2020-06-212121.92 Covidlandia2020-06-2224816.98 Covidlandia2020-06-2335743.95 Covidlandia2020-06-2441917.37 Covidlandia2020-06-25416-0.72 Covidlandia2020-06-26403-3.13 Covidlandia2020-06-27400-0.74 Covidlandia2020-06-28396-1.00 Covidlandia2020-06-293960.00 Covidlandia2020-06-30347-12.37 Note that the first value is again NULL. The reason is the same as it was in the previous example. Now that you’ve learned this calculation on a made-up data set, try analyzing real COVID-19 data. My colleague’s article gives you very detailed guidance on how to do that as well as show some other uses of window functions. Example 6: Calculating Moving Averages of Monthly Site Visits This time, you work in an e-commerce company with three sites. You’re the manager at the company, and you’ve asked your analyst to prepare a report showing average monthly visits for each site owned by your company. Your analyst is, of course, very experienced. He or she knows that calculating average monthly visits is straightforward but would not smooth out the volatility and the seasonality of the site visits. This is a reason to get a report with moving averages, which consider the current month and the two previous months. In case you’re not familiar with moving averages, here is a simple example. We have monthly data for hotel overnight stays, their overall average (i.e. the arithmetic mean), and the moving averages. The hospitality industry can be very seasonal, so calculating the arithmetic mean can give you a very distorted picture. You can see that the average of monthly overnight stays is 2165.83. The moving averages range from 541.33 to 3832.00, which are far more realistic. Here’s the table to see for yourself: monthovernight_staysaveragemoving_average 01/20193,5822,165.833,582.00 02/20191,8022,165.832,692.00 03/20196872,165.832,023.67 04/20192482,165.83912.33 05/20196892,165.83541.33 06/20192,2502,165.831,062.33 07/20193,0122,165.831,983.67 08/20195,8972,165.833,719.67 09/20192,5872,165.833,832.00 10/20194822,165.832,988.67 11/20192342,165.831,101.00 12/20194,5202,165.831,745.33 To visualize the difference between the overall average and the moving average, take a look at the chart below: We now move on to calculating moving averages using SQL window functions. Back to our monthly site visits. The table site_visit consists of the following columns: id: the ID of the visit site: the name of the site month: the month of the visits number_of_visits: the number of the site visits The code below is what you need: SELECT id, site, month, number_of_visits, AVG (number_of_visits) OVER (PARTITION BY site ORDER BY month ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING) AS moving_average_visits FROM site_visit; Let me explain what I am doing here. First, I select all the columns from the table site_visit. Then comes the interesting part! I use the SQL window function AVG (number_of_visits), since I want the average of the site visits. Then comes the OVER() clause, as always. Since I want averages separately for each site rather than for all three sites together, I use PARTITION BY site. This means I am aggregating data at the site level. The operation needs to be performed sequentially by month and not in some random order. To ensure this, there is ORDER BY month. I want to calculate the moving averages over the current month and the two previous months to smooth out the volatility. This is defined by ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING. This takes three rows into account when calculating the moving averages: the current row and the two rows preceding it. Depending on how you want to calculate the moving averages, you can increase or decrease the number of preceding and following rows. For example, if you write ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING, this means you’re taking a total of six rows to calculate the moving average: the current row, the three rows preceding it, and the two rows following it. Here’s my report: idsitemonthnumber_of_visitsmoving_average_visits 1E-commerceSuperSite01/201945,789,46545,789,465.00 2E-commerceSuperSite02/201945,852,42945,820,947.00 3E-commerceSuperSite03/201945,857,46545,833,119.67 4E-commerceSuperSite04/201945,987,45245,899,115.33 5E-commerceSuperSite05/201946,124,75645,989,891.00 6E-commerceSuperSite06/201946,125,74646,079,318.00 7E-commerceSuperSite07/201946,124,75646,125,086.00 8E-commerceSuperSite08/201946,114,78446,121,762.00 9E-commerceSuperSite09/201946,125,41146,121,650.33 10E-commerceSuperSite10/201946,125,78446,121,993.00 11E-commerceSuperSite11/201946,178,42146,143,205.33 12E-commerceSuperSite12/201946,170,25446,158,153.00 1GreatSite4U01/201956,78956,789.00 2GreatSite4U02/201974,56465,676.50 3GreatSite4U03/201985,42672,259.67 4GreatSite4U04/201972,54777,512.33 5GreatSite4U05/201975,00077,657.67 6GreatSite4U06/201992,54680,031.00 7GreatSite4U07/201989,54685,697.33 8GreatSite4U08/201987,23789,776.33 9GreatSite4U09/201987,41288,065.00 10GreatSite4U10/201976,39883,682.33 11GreatSite4U11/201969,87477,894.67 12GreatSite4U12/201984,41776,896.33 1PleaseVisit01/2019897897.00 2PleaseVisit02/2019658777.50 3PleaseVisit03/20192,5871,380.67 4PleaseVisit04/20196,8453,363.33 5PleaseVisit05/201910,2546,562.00 6PleaseVisit06/201911,4879,528.67 7PleaseVisit07/201913,34511,695.33 8PleaseVisit08/201914,89713,243.00 9PleaseVisit09/201915,49714,579.67 10PleaseVisit10/201918,84516,413.00 11PleaseVisit11/201928,46720,936.33 12PleaseVisit12/201984,41743,909.67 Let’s analyze the result a bit to understand how the calculation works. The first moving average is the same as the number of visits. This is expected, because there is no data before this row. SQL takes the total number of visits, divides it by the number of records (which is one, in this case), and returns the result that equals the number of visits for the month. idsitemonthnumber_of_visitsmoving_average_visits 1E-commerceSuperSite01/201945,789,46545,789,465.00 Did you expect that the moving average in the second row would be equal to the number of visits for the month since there’s not enough history? If you did, then you would be wrong! This moving average sums the current row and the row before and divides the result by the number of rows. It is divided by two, even though I specify I want three rows considered. Let’s check the result: (45,789,465 + 45,852,429)/2 = 45,820,947.00 It is correct! idsitemonthnumber_of_visitsmoving_average_visits 1E-commerceSuperSite01/201945,789,46545,789,465.00 2E-commerceSuperSite02/201945,852,42945,820,947.00 The third row is finally doing what I want: take the current row, take the two rows before it, and return the average. Let’s check: (45,789,465 + 45,852,429 + 45,857,465)/3 = 45,833,119.67 Correct again! No need to check anything else; maybe we should just start trusting SQL! idsitemonthnumber_of_visitsmoving_average_visits 1E-commerceSuperSite01/201945,789,46545,789,465.00 2E-commerceSuperSite02/201945,852,42945,820,947.00 3E-commerceSuperSite03/201945,857,46545,833,119.67 Calculating Running Totals Running totals are found in different kinds of analysis. They can be very helpful in financial analysis, for example, and SQL window functions provide you with the tools to calculate them easily. Running totals are also called cumulative sums, since they add the current values to the total of all previous values. This article about running totals has an approachable explanation, with several examples of how it is used. It doesn’t mean I won’t show you some examples too! Example 7: Calculating Running Totals of the Debt Collected by Your Call Center It seems every company has a call center today. Banks, other financial institutions, debt collection agencies, telecom companies, you name it. One purpose of the call center is to remind the customer of the unpaid bills. You’re monitoring the efficiency of the call center and want to analyze the amount collected after the call center contacts the customer. The table is debt_collected, and its columns are as follows: id: the ID of the debt collected month: the month when the debt was collected amount: the amount of the debt collected To get the running totals, you need this code: SELECT id, month, amount, SUM(amount) OVER (ORDER BY month) AS debt_collected_rt FROM debt_collected; First, I select the columns from the table debt_collected. Then I need to calculate the running totals. To do that, I need SUM() used as a window function with the column to be totaled specified in the parenthesis. Then comes the OVER() clause; PARTITION BY() is omitted because I want the running total of all the data available. The operation will be performed sequentially, from January to December, and not in some random order; hence the data is ordered by month. The result will appear in the column debt_collected_rt. The result of the query looks like this: idmonthamountdebt_collected_rt 101/2019575,457.28575,457.28 202/2019578,200.851,153,658.13 303/2019567,257.771,720,915.90 404/2019657,452.122,378,368.02 505/2019622,157.423,000,525.44 606/2019608,745.473,609,270.91 707/2019594,122.334,203,393.24 808/2019591,114.494,794,507.73 909/2019541,258.685,335,766.41 1010/2019584,127.115,919,893.52 1111/2019587,774.436,507,667.95 1212/2019596,471.877,104,139.82 Example 8: Calculating Running Totals of the Employee Costs Let’s practice a little more with a similar example. What if you’re an HR manager and need to see the monthly employee costs, together with the running totals? For instance, if your task is to plan the budget for the next year, you need some historical data to help you. The previous year’s budget is always a good starting point. The data can be found in the table employee_costs, and its columns are: id: the ID of the employee costs month: the month of the employee costs amount: the amount of the employee costs The code is practically the same as in the previous example: SELECT id, month, amount, SUM(amount) OVER (ORDER BY month) AS costs_rt FROM employee_costs; I’ve selected all the columns from the table employee_costs. The window function sums the amount month by month and puts the result in the new column costs_rt. idmonthamountcosts_rt 101/201984,992.5784,992.57 202/201987,562.24172,554.81 303/201986,451.82259,006.63 404/201986,451.82345,458.45 505/201985,456.13430,914.58 606/201986,782.45517,697.03 707/201988,253.45605,950.48 808/201988,795.64694,746.12 909/201989,974.34784,720.46 1010/201992,444.44877,164.90 1111/201993,012.55970,177.45 1212/201993,999.141,064,176.59 Example 9: Calculating Running Totals of Quarterly Sales If you’re a financial analyst, a sales manager, a regional manager, or any kind of manager, you’ve probably seen reports like this. In this scenario, you work for a company with five geographical regions. You have data on quarterly sales by region. What you want is a report showing the running total of the sales separately for each region. All the data you need is in the table regional_sales. The columns are: id: the ID of the sales region: the name of the region quarter: the quarter of the sales amount: the amount of the sales Here’s a simple code that allows you to see the required results: SELECT id, region, quarter, amount, SUM (amount) OVER (PARTITION BY region ORDER BY quarter) AS regional_sales_rt FROM regional_sales; As always, I select all columns from the table. Then comes the window function part. I sum up the amount by using the SUM() function. The window is defined by OVER(). There is a PARTITION BY clause this time, because I want to see the data by region. The data needs to be summed up in order by quarter, so I order the data by quarter. Finally, the resulting data will be shown in the column called regional_sales_rt. idregionquarteramountregional_sales_rt 1Central Europe1Q20197,854,127.327,854,127.32 2Central Europe2Q20197,782,112.2315,636,239.55 3Central Europe3Q20197,612,556.8823,248,796.43 4Central Europe4Q20198,023,448.7731,272,245.20 5Eastern Europe1Q20195,412,444.625,412,444.62 6Eastern Europe2Q20195,208,412.3710,620,856.99 7Eastern Europe3Q20195,132,445.5815,753,302.57 8Eastern Europe4Q20195,800,613.2221,553,915.79 13Northern Europe1Q20193,541,222.143,541,222.14 14Northern Europe2Q20193,247,772.676,788,994.81 15Northern Europe3Q20193,456,773.2910,245,768.10 16Northern Europe4Q20193,320,520.8413,566,288.94 17Southern Europe1Q20191,482,222.661,482,222.66 18Southern Europe2Q20191,628,741.563,110,964.22 19Southern Europe3Q20192,208,456.035,319,420.25 20Southern Europe4Q20192,485,212.337,804,632.58 9Western Europe1Q201911,285,774.2611,285,774.26 10Western Europe2Q201911,487,662.2922,773,436.55 11Western Europe3Q201912,564,442.8335,337,879.38 12Western Europe4Q201911,662,451.1847,000,330.56 Do you Want to Learn SQL Window Functions? By giving you nine examples, I’ve tried to show you various scenarios in which you could find SQL window functions helpful. What you have learned about window functions here should make it easier for you to start learning through our Window Functions course. The examples in this article do not cover everything window functions can do. Instead of smothering you with all their possibilities, we presented only a handful of uses. The point was to show you how one function could be helpful for several different job descriptions. As you have seen, learning about window functions is wise for every manager and data analyst. Of course, the usefulness of the window functions doesn’t stop here. Experts working in sales, retail, inventory, e-commerce, product and supply chain management, finance, and public health, among others, will find learning about window functions makes their job easier. I hope I’ve managed to find at least one example to which you could relate in your everyday tasks. If I have not, however, it doesn’t mean SQL window functions can’t be useful to you. It just reflects the wide variety of scenarios in which window functions could be used; it is impossible to cover with just a handful of examples. If you think a little bit about your day-to-day tasks, I’m sure you will find situations in which you could apply window functions. I’d like to hear from you in the comments section. Feel free to share your experience with SQL window functions and how you use them. Tags: sql learn sql window functions