29th Sep 2020 11 minutes read Six Window Function Templates to Save in Your Code Editor Tihomir Babic sql learn sql window functions Table of Contents What Are Window Functions? Template 1: Ranking the Rows Template 2: Difference Between two Consecutive Periods Template 3: Percent Difference Between two Consecutive Periods Template 4: Calculating the Running Total Template 5: Calculating the Moving Average Template 6: Calculating the Last/First Value in Each Group Interested in the Possibilities of the SQL Window Functions? Do you use window functions often enough to know them well but not often enough to remember all the details? Here are six window function codes you should save as templates. Window functions are a handy SQL option, but it doesn’t mean you use them all the time. Maybe you only use them occasionally; if that’s the case, you probably find yourself needing to refresh your memory every time you use them. It’s easy to forget the nuances of the functions, and you’re pulling your hair out and wondering why the code isn’t working. They say it’s always better to write something down, not remember it! In the world of SQL, writing down equals saving your templates in the code editor. Why shouldn’t you make your life easier and have a cheat sheet for window functions? Speaking of cheat sheets, there’s one already. You can print it and have it by your side; that way, you can quickly remember the syntax and how any specific function works. Everything in this cheat sheet, you can also learn and practice in our course on window functions. Having a cheat sheet is good but having it in your code editor is even better. What Are Window Functions? You probably remember that window functions do calculations across a set of table rows. This set of rows is called a window or a window frame. All this and much more is explained in a very accessible way in this article. A window function syntax looks like this: window_function ([ALL] expression) OVER ( [PARTITION BY partition_list] [ORDER BY order_list] [window_frame_clause] ) You see that the window function syntax is rather complicated. It’s easy to forget the details if you’re not using it very often—especially the nuances that differentiate one function from another. Having templates in your code editor is a handy way to write the code much quicker. I won’t go into further explanation of the syntax, but I won’t leave you in the lurch, either. Here’s an article explaining the window function syntax and all its elements. Now, on to the templates! Template 1: Ranking the Rows By using the RANK() function, I can rank anything I want. You’ll probably use it pretty often; we live in a world of top lists. Just take a look at the title of this article. You just can’t escape the top five this, the best ten that, or the worst fifteen something. The business world is no different, so it’s best you have your ranking functions ready. Ranking functions? Yes, plural. Aside from RANK(), there are three other ranking functions: DENSE RANK ROW_NUMBER NTILE More on these ranking functions can be found in an article dedicated solely to the ranking functions. Here is the table rich_people, containing the ten wealthiest people in the world in no particular order. The table includes the following columns: id: the ID of the person first_name: the first name of the person last_name: the last name of the person net_worth: the net worth of the person in billions of dollars How would you select only the five richest people and rank them from the richest to the least rich? Here’s how: SELECT * FROM ( SELECT RANK() OVER (ORDER BY net_worth DESC) AS row_rank, first_name, last_name, net_worth FROM rich_people ) AS net_worth_rank WHERE row_rank <= 5; In this code, I use a subquery to get the desired result. This subquery contains a RANK() function, ordering the rows in a descending order by the column net_worth. The rank of the rows will appear in the new column row_rank. It also selects the columns from the table rich_people. The subquery used in the FROM clause has to have an alias; here, it’s net_worth_rank. Since I want only the top five rich people, there is a condition WHERE row_rank <= 5. Run the query to get the following result: row_rankfirst_namelast_namenet_worth 1NaomiMarken644.58 2KarelCharapek612.33 3JebediahVaultory562.84 4Jean-JacquesQuichelorraine499.51 5Vicky ChristinaBarcelona425.37 Store this query in your code editor, and you’ll always have a ranking code ready to play with and adapt to your needs. I didn’t use PARTITION BY in this code, which you may need sometimes. If you’re not sure how to use it, this article talks about ranking and using PARTITION BY. Template 2: Difference Between two Consecutive Periods This is another common use of SQL window functions. It’s also common in any data-related job to calculate daily, monthly, quarterly, or any other differences between consecutive periods. I’m going to show you how to use window functions to do this easily. There’s a table country_gdp with the following columns: country: the name of the country GDP: the GDP of the country in billions of dollars year: the year of the GDP Your task is to calculate the difference in GDP (increase/decrease) between the years. You could do that by using the window function LAG() like this: SELECT country, GDP, year, GDP - LAG (GDP) OVER (PARTITION BY country ORDER BY year) AS GDP_difference FROM country_gdp; The code selects all three columns from the table country_gdp. To calculate the difference, you need to take the GDP of one year and subtract from it the GDP of the previous year. The LAG() function helps you achieve that; it returns the value of the previous row. This means the part GDP - LAG (GDP) OVER (PARTITION BY country ORDER BY year) in the code is nothing more than the current GDP minus the previous GDP, since LAG() takes the GDP value from the previous row. We need to do this by country, hence PARTITION BY country. The calculation needs to be performed for every year starting from the earliest year to the most recent, i.e., in ascending order, so it’s ordered by the column year. The result will appear in the column GDP_difference. And here’s the result: countryGDPyearGDP_difference Quazistan585.792017NULL Quazistan584.592018-1.2 Quazistan562.552019-22.04 Snowland185.682017NULL Snowland190.5720184.89 Snowland180.562019-10.01 Sunnylandia270.122017NULL Sunnylandia276.5620186.44 Sunnylandia294.22201917.66 We have a NULL value for every country in 2017, because this is the starting year. There is no previous row we can use to subtract from the 2017 value. Template 3: Percent Difference Between two Consecutive Periods This is a variation of the previous template. However, it’s nice to have them saved separately, because sometimes you just need the percentage. This way, there’s no need to tweak the previous query; you simply run the query below. Using the same data from the example above, how would you calculate the percent difference in GDP? Let’s take Quazistan as an example. countryGDPyearGDP_difference Quazistan585.792017NULL Quazistan584.592018-1.2 In this case, the percent difference should be (584.59-585.79)/585.79*100. It’s written in SQL like this: SELECT country, GDP, year, ((GDP - LAG (GDP) OVER (PARTITION BY country ORDER BY year))/LAG (GDP) OVER (PARTITION BY country ORDER BY year)*100) AS GDP_percent_difference FROM country_gdp; The code is almost identical to that of the previous example. Except this time, we don’t just subtract one value from another using the LAG() function. The difference is then divided by the GDP of the previous year, again using the LAG() function, after which the result is multiplied by 100 to get the percentage. This percentage shows up in the column GDP_percent_difference. Run the code to get this result: countryGDPyearGDP_percent_difference Quazistan585.792017NULL Quazistan584.592018-0.204851568 Quazistan562.552019-3.770163704 Snowland185.682017NULL Snowland190.5720182.633563119 Snowland180.562019-5.252663063 Sunnylandia270.122017NULL Sunnylandia276.5620182.384125574 Sunnylandia294.2220196.385594446 Template 4: Calculating the Running Total The running total is the same as the cumulative sum. While it’s not very advanced mathematics to say the least, you’ll often run into it when looking at data. Fortunately, this calculation becomes even easier by using window functions. You have a table named subscribers, containing the monthly number of subscribers to your YouTube channel from three cities. These are the columns: city: the name of the city new_subscribers: the number of new subscribers month: the month of subscription How would you calculate the running total of your subscribers by city? Here’s how: SELECT city, new_subscribers, SUM (new_subscribers) OVER (PARTITION BY city ORDER BY month) AS cumulative_subscribers, month FROM subscribers; Apart from selecting all the columns from the table subscribers, what does this code do? It calculates the running total and shows the result in the column cumulative_subscribers. To do that, I use the SUM() window function. This function summarizes the column new_subscribers, defined in the parentheses. Then, the regular part of the window function follows. We need the result by city, so this is specified by the PARTITION BY. The operation needs to be performed for each month starting from the oldest, which can be done by ordering the rows by the column month. There you have it, a lovely little template giving you the following result: citynew_subscriberscumulative_subscribersmonth Amsterdam41241204/2020 Amsterdam24165305/2020 Amsterdam31296506/2020 Warsaw10810804/2020 Warsaw9119905/2020 Warsaw12432306/2020 Zagreb282804/2020 Zagreb396705/2020 Zagreb5011706/2020 Template 5: Calculating the Moving Average The moving average has several pseudonyms. If someone refers to the rolling mean, the rolling average, or the running average, don’t worry; it’s the same thing as the moving average. Different names, same calculation. The moving average is often used in analyzing time series data to determine trends. This article will help you understand what the moving average is for those of you who are not familiar with it. Here’s an example of calculating the moving average in SQL. Feel free to store it in your code editor, adapt it to your needs, and use it whenever needed. The table for this example is eur_usd, which contains historical exchange rates from the EUR to the USD in the following columns: date: the date of the exchange rate exchange_rate: the value of the exchange rate To calculate the three-day moving average of the exchange rate, you’ll need the following code: SELECT date, exchange_rate, AVG (exchange_rate) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_average FROM eur_usd; The code first selects the columns from the table eur_usd. There’s an AVG() function calculating the average of the exchange rate. This calculation needs to be performed sequentially, so I’ve ordered it by date. The steps to calculate the three-day moving average is to select the values from the current row and the previous two, sum them, then divide the sum by three, i.e., the number of rows taken into account. This is done by the part of the code: ROWS BETWEEN 2 PRECEDING AND CURRENT ROW. You’ll see the result in the column moving_average; here it is: dateexchange_ratemoving_average 2020-09-011.19211.1921 2020-09-021.18531.1887 2020-09-031.18471.187366 2020-09-041.18381.1846 2020-09-051.18381.1841 2020-09-061.18431.183966 2020-09-071.18161.183233 Template 6: Calculating the Last/First Value in Each Group This is a convenient feature of window functions, allowing you to group the data and then find the row with the smallest or the largest value. It’s also possible to do that with a GROUP BY, but only if you’re using MySQL. What if you’re not? The functions LAST_VALUE and FIRST_VALUE are what you need. How do they work? I’ll show you an example, using the product_sold table. It contains the following columns: product: the name of the product product_category: the category of the product items_sold: the number of product items sold SELECT product, product_category, items_sold, LAST_VALUE (product) OVER (PARTITION BY product_category ORDER BY items_sold ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS bestseller FROM product_sold; What does this window function do? I want the best-selling product, that is, the largest number of items sold, so I define it by LAST_VALUE (product). Of course, this needs to be done by category, so I specify that with PARTITION BY product_category. What are the criteria for the product being the best-selling? It's by the number of items sold. That’s why we have ORDER BY items_sold; the value of the sold items in the last row represents the largest volume. After that, I define the frame by writing ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING; this means the frame starts at the first row and ends at the last row. This is the result of the query: productproduct_categoryitems_soldbestseller CD51Cable13CD61 CD31Cable100CD61 CD41Cable417CD61 CD61Cable2541CD61 MX001Headphones59MX500 MX202Headphones658MX500 MX420Headphones1586MX500 MX500Headphones2587MX500 BX1AMicrophone354BX1B BX1DMicrophone412BX1B BX1CMicrophone587BX1B BX1BMicrophone954BX1B The logic is similar if you want to calculate the product that sold the least in each category. In that case, you use FIRST_VALUE() function. SELECT product, product_category, items_sold, FIRST_VALUE (product) OVER (PARTITION BY product_category ORDER BY items_sold) AS worstseller FROM product_sold; This is almost identical to the previous code. Instead of LAST_VALUE(), there’s FIRST_VALUE(). The window is again defined by product_category, and the rows are ordered by items_sold. Here, you can omit the part ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. The result of this query looks like this: productproduct_categoryitems_soldworstseller CD51Cable13CD51 CD31Cable100CD51 CD41Cable417CD51 CD61Cable2541CD51 MX001Headphones59MX001 MX202Headphones658MX001 MX420Headphones1586MX001 MX500Headphones2587MX001 BX1AMicrophone354BX1A BX1DMicrophone412BX1A BX1CMicrophone587BX1A BX1BMicrophone954BX1A Interested in the Possibilities of the SQL Window Functions? These six are probably the most commonly used window functions. This is only to show you how applicable they are in business and everyday life. There’s much more to window functions, which you can discover in the window functions course. If you want to learn about window functions and are interested in what this course can offer you, read the interview with the course creator herself. Feel free to use the comments section! Let me know which window functions you have already saved in your code editor. Tags: sql learn sql window functions