12th May 2022 11 minutes read What Are Window Functions in SQL? Tihomir Babic sql learn sql window functions Table of Contents Window Functions vs. Aggregate Functions (and GROUP BY) When to Use SQL Window Functions Window Function Examples The Data for the Examples Example 1: The OVER() Clause Example 2: OVER(ORDER BY) Example 3: OVER(PARTITION BY) Example 4: OVER(PARTITION BY ORDER BY) Get to Know Window Functions Better! In this article, we discuss what window functions are and how they help you do your job as a data analyst or specialist. No, window functions have nothing to do with the operating system. The window here refers to a set of rows over which the function aggregates data. To learn how window functions work, what functions there are, and how to apply them to real-world problems, it’s best to take the Window Functions course. You can find all about it here. It’s interactive, there are 218 exercises, and you only need a web browser and some basic SQL knowledge. When you finish it, you’ll know how to calculate running totals and running averages, rank data, find the best and the worst performers, analyze trends, and calculate metrics such as commission percentages. Then, you can focus on practice. Use the Window Functions Practice Set to hone your skills. The idea of data aggregation may trigger the following question: how are window functions different from aggregate functions if they both aggregate data? Window Functions vs. Aggregate Functions (and GROUP BY) Window and aggregate functions overlap to an extent. By this, I mean some aggregate functions may also be used as window functions. However, there’s much more to window functions. They also come in the form of ranking, distribution, and analytic functions. But this still doesn’t answer the question about the difference between how window and aggregate functions work. The main difference is that aggregate functions collapse all individual rows and show only the aggregated values. In contrast, window functions don’t do anything of the sort. They allow you to display individual rows and aggregate values simultaneously. This makes them a very powerful tool for creating more complex reports. Here’s how window functions vs. GROUP BY compare in practice. This leads to the next question: When are window functions used? When to Use SQL Window Functions Business uses of window functions are extensive. Some practical examples include calculating running totals (of sales, for instance), calculating averages and moving averages (of stock prices, for example), finding the highest and the lowest sales per branch or salesperson, counting the number of items sold, analyzing a time series (revenue, price, or inventory change by month, quarter, or year, for example), ranking your data (for instance, employees by salary, number of finished projects, time worked, or sales by product, branch, or country), finding the n-th value (e.g., the employee with the third-highest salary), and so on. Window Function Examples I’ll show you some of the uses here with examples to explain the syntax of window functions. You can always have the Window Functions Cheat Sheet handy in case something is not clear. Also, there’s an article with a detailed explanation of window functions, their use, and syntax. The Data for the Examples The data shows a company whose business is retail and B2B sales of electronic equipment. Their database consists of three tables. The first table is product: id – The ID of the product and the primary key (PK) of the table. product_name – The name of the product. product_type – The type of the product. idproduct_nameproduct_type 1Mobilomobile phone 2Cellularcellmobile phone 3Swipertablet 4Robo-tabtablet 5DigiPhonemobile phone The table city shows the city where each store is located: id – The ID of the city and the primary key (PK) of the table. city_name – The name of the city. The company has stores in three cities: idcity_name 1Seville 2Oviedo 3Valencia The third table is product_sales, and you can imagine what data it stores. id – The ID of the sales amount and the primary key (PK) of the table. date – The date sold. sales – The sales amount. product_id – The ID of the product and the foreign key (FK) of the table that references the table product. city_id – The ID of the city and the foreign key (FK) of the table that references the table city. Here are a handful of rows from the table: iddatesalesproduct_idcity_id 12022-04-0474,598.2211 22022-04-0410,887.2112 32022-04-0421,487.6913 42022-04-0484,120.0021 Example 1: The OVER() Clause In this example, I show you how the OVER() clause works. It’s an SQL clause that calls a window function. Without it, window functions do not work. Other clauses may be part of the OVER() clause. To get you started, I'll show you only the OVER() clause without additional clauses. Then, we build on that in the subsequent examples. In this first example, I want to show the sales amount by product, the average sales amount by product, and how each line of sales compares with the average. This metric shows me how much (in percentage) an individual line of sales is above or below the average. I only want to show the product with the ID = 1 and sales on 4 April 2022. The code that gives me the desired output is: SELECT city_id, city_name, product_name, sales, AVG(sales) OVER() AS avg_sales, sales/AVG(sales) OVER()*100 - 100 AS sales_over_average FROM product p JOIN product_sales ps ON p.id = ps.product_id JOIN city c ON c.id = ps.city_id WHERE p.id = 1 AND date = '2022-04-04'; The columns city_id, city_name, product_name, and sales are selected. To get the average sales, I need to use the AVG() function. It is then followed by the OVER() clause with its empty parentheses. If you don’t specify anything in the OVER() clause, the window function is applied over the whole result set. To compare each line of sales amount with the average, I divide the column sales by the column avg_sales. Since SQL doesn’t allow me to call the alias of an expression, I need to write the window function again. I also want to see a percentage, so I multiply the result by 100. Then, I subtract 100 from the result because I want to see by what percent it is higher or lower than the average. I need to filter the data according to the product ID and the date sold. Here’s what the code returns: city_idcity_nameproduct_namesalesavg_salessales_over_average 1SevilleMobilo74,598.2235,657.71109.21 2OviedoMobilo10,887.2135,657.71-69.47 3ValenciaMobilo21,487.6935,657.71-39.74 Instantly, you see the main characteristic of window functions: the table shows both individual and aggregate values. How do you read this result? There are three rows because the company has stores in three cities. So, the sales amounts on 4 April 2022 for the Mobilo product were 74,598.22 in Seville, 10,887.21 in Oviedo, and 21,487.69 in Valencia. Next, I’m interested in the average sales. The average sales of Mobilo on 4 April 2022 in all three cities was 35,657.71. The sales_over_average column tells me the sales amount of 74,598.22 in Seville was 109.21% above the average for the day. For Oviedo, it was 69.47% below the average, while in Valencia, the sales amount of 21,487.69 was 39.74% below the average. Example 2: OVER(ORDER BY) Unlike the ordinary ORDER BY used commonly in a SELECT statement, the ORDER BY clause in window functions does not order the rows in the output. Instead, ORDER BY here means the order of the data over which the window function is executed. A great way to show this is using the RANK() function. I use it to find the top ten sales amounts: SELECT product_name, date, sales, RANK() OVER (ORDER BY sales DESC) AS sales_rank FROM product p JOIN product_sales ps ON p.id = ps.product_id LIMIT 10; I’m interested to see the product, the date sold, and the individual lines of sales. To rank them, I use the RANK() window function. Here, the parentheses of the function are empty. Then, as usual, comes the OVER() clause, which indicates the presence of a window function. The ORDER BY clause specifies the order in which the function works. In my case, I want to rank the rows by sales amount. I need the top ten sales amounts, so I rank them in descending order and limit the output to ten rows. product_namedatesalessales_rank Swiper2022-04-0492,784.651 Cellularcell2022-04-0484,741.652 Cellularcell2022-04-0484,120.003 Mobilo2022-04-0474,598.224 Robo-tab2022-04-0767,800.085 Robo-tab2022-04-0866,507.776 Mobilo2022-04-0565,124.557 Robo-tab2022-04-0964,777.148 Swiper2022-04-0563,636.559 Robo-tab2022-04-0757,801.8010 The output shows the highest sales amount was 92,784.65 on 4 April 2022 of the Swiper product. The second and third highest sales amounts were of Cellularcell, both on 4 April 2022. They were 84,741.65 and 84,120.00, respectively. Example 3: OVER(PARTITION BY) The PARTITION BY clause in window functions is used to define how rows of data are divided into groups. It’s similar to the GROUP BY clause, but PARTITION BY does not collapse the individual rows when grouping the data. If you want to use a PARTITION BY clause, it has to be part of an OVER() clause. In this example below, I show you how to output the sales amount by city. To do what I want, I use the SUM() window function. SELECT product_name, date, city_name, sales, SUM(sales) OVER(PARTITION BY city_name) AS total_sales FROM city c JOIN product_sales ps ON c.id = ps.city_id JOIN product p ON p.id = ps.product_id ORDER BY city_name, date, product_name; First, I select the product name, the date sold, the city, and the sales amount. Remember what I said earlier: a window function uses the whole table if only OVER() is specified. I don’t want that here. I want to see the sales amount by city. The PARTITION BY clause is how I achieve this. First, I tell the SUM() function to sum the sales amounts. Next comes the OVER() clause to transform the aggregate function into a window function. Then, there is a PARTITION BY clause. I partition the data by the column city_name, which results in the window function returning the sum of the sales amounts by city. Here’s the result. Since it has many rows, here are just a few rows for each city. product_namedatecity_namesalestotal_sales Cellularcell2022-04-04Oviedo84,741.65820,834.31 DigiPhone2022-04-04Oviedo56,999.35820,834.31 Mobilo2022-04-04Oviedo10,887.21820,834.31 Robo-tab2022-04-04Oviedo33,214.65820,834.31 Swiper2022-04-04Oviedo4,521.09820,834.31 …………… Cellularcell2022-04-04Seville84,120.001,046,700.24 DigiPhone2022-04-04Seville54,177.121,046,700.24 Mobilo2022-04-04Seville74,598.221,046,700.24 Robo-tab2022-04-04Seville12,577.771,046,700.24 Swiper2022-04-04Seville92,784.651,046,700.24 …………… Cellularcell2022-04-04Valencia35,487.12708,418.52 DigiPhone2022-04-04Valencia18,474.44708,418.52 Mobilo2022-04-04Valencia21,487.69708,418.52 Robo-tab2022-04-04Valencia9,456.12708,418.52 Swiper2022-04-04Valencia16,547.98708,418.52 This table shows all the sales amounts by city. The column total_sales shows that the total sales amount for all products in Oviedo was 820,834.31. At the same time, individual lines of sales by date and product are also shown. The total sales amount in Seville was 1,046,700.24, while in Valencia, it was 708,418.52. Example 4: OVER(PARTITION BY ORDER BY) Now, I’ll use all three clauses simultaneously to show you what they can do. This next example shows you how to calculate the cumulative sum (or running total) of sales by product and city. The cumulative sum takes the sum of all the previous rows and adds the value in the current row. You’ll see what I mean when I show you the output of this code: SELECT product_name, date, city_name, sales, SUM(sales) OVER (PARTITION BY city_name, product_name ORDER BY date) AS cumulative_sum FROM product p JOIN product_sales ps ON p.id = ps.product_id JOIN city c ON ps.city_id = c.id; Here, I join all three tables to get the following columns: product_name, date, city_name, and sales. To get the cumulative sum, you need to use the SUM() function. Again, use the OVER() clause to make it a window function. I want to see the cumulative sales by product and by city. This is why I specified these two columns in the PARTITION BY clause. Also, it makes sense that the cumulative sales are calculated from the oldest to the newest date, hence the ORDER BY date. It does so in ascending order. This output has many columns, so here is a partial look: product_namedatecity_namesalescumulative_sum Cellularcell2022-04-04Oviedo84,741.6584,741.65 Cellularcell2022-04-05Oviedo44,741.82129,483.47 Cellularcell2022-04-06Oviedo15,744.25145,227.72 Cellularcell2022-04-07Oviedo16,698.56161,926.28 Cellularcell2022-04-08Oviedo27,504.06189,430.34 Cellularcell2022-04-09Oviedo24,888.55214,318.89 DigiPhone2022-04-04Oviedo56,999.3556,999.35 DigiPhone2022-04-05Oviedo54,741.88111,741.23 DigiPhone2022-04-06Oviedo8,499.62120,240.85 DigiPhone2022-04-07Oviedo49,078.41169,319.26 DigiPhone2022-04-08Oviedo19,844.55189,163.81 DigiPhone2022-04-09Oviedo5,148.26194,312.07 Now you see how the cumulative sum works. The sales amount in the first row is 84,741.65, and the cumulative sum is the same value because there are no previous rows. The next cumulative sum is: 84,741.65 + 44,741.82 = 129,483.47. You go on until you reach the sales amount of Cellularcell on 9 April 2022 in Oviedo, which is 214,318.89. The next row is also sales for Oviedo, but this time it’s the sales of DigiPhone. In my code, I partition the data by city and product. This means when the window function reaches the next product sold in the same city, the cumulative sum restarts. This is evident because the sales amount of DigiPhone on 4 April 2022 was 56,999.35, the same as the cumulative sales amount. Then, when the same product is sold on 5 April 2022, the cumulative sum is 56,999.35 + 54,741.88 = 111,741.23, and so on. If you want more, there are additional window functions examples. Some of them are even in the form of job interview questions. It comes in quite handy if you’re looking for a job! Get to Know Window Functions Better! This article on SQL window functions was intended only as an overview. It gives you a taste of what they are, what they do, how their syntax works, and how you may use them in your everyday job. You can easily build on what you have learned here. More knowledge and practical examples await you in the Window Functions course! Tags: sql learn sql window functions