29th Oct 2021 8 minutes read What Is a Rolling Average and How Do You Compute It in SQL? Ignacio L. Bisso sql learn sql window functions Table of Contents What’s a Rolling Average? Using Rolling Averages to Discover Trends in New Users The Rolling Average in Economics Moving Beyond Rolling Averages in SQL In this article, we will show three different examples of using SQL to calculate rolling averages. Learn this and you will take an important step in your SQL growth! A rolling average is a metric that allows us to find trends that would otherwise be hard to detect. It is usually based on time series data. In SQL, we calculate rolling averages using window functions. First, let’s talk about what rolling averages are and why they’re useful. What’s a Rolling Average? A rolling average is a calculation that lets us analyze data points by creating a series of averages based on different subsets of a data set. It’s also called a moving average, a running average, a moving mean, or a rolling mean. You’ll very often see rolling averages used in time series data to analyze trends, especially when short-term fluctuations can hide a longer-term trend or cycle. To show an example of a rolling average in SQL, we’ll use a stock values data set. Suppose we have a table called stock_values like the one shown below: date_timestock_price 01/04/2021 17:00100.00 01/05/2021 17:00130.00 01/06/2021 17:0090.00 01/07/2021 17:00105.00 01/08/2021 17:00110.00 01/09/2021 17:00140.00 01/10/2021 17:0087.00 01/11/2021 17:00107.00 01/12/2021 17:00147.00 01/13/2021 17:0092.00 01/14/2021 11:00110.00 01/15/2021 17:00150.00 01/16/2021 17:00155.00 01/17/2021 17:0097.00 01/18/2021 17:00112.00 01/19/2021 17:00112.00 In the next query, we’ll demonstrate how to use SQL to calculate the moving average for the column stock_price based on the three previous values and the current stock value: SELECT date_time, stock_price, TRUNC(AVG(stock_price) OVER(ORDER BY date_time ROWS BETWEEN 3 PRECEDING AND CURRENT ROW), 2) AS moving_average FROM stock_values; This SQL query uses the window function AVG() on a set of values ordered by date_time. The clause ROWS BETWEEN 3 PRECEDING AND CURRENT ROW indicates that the average must be calculated only using the stock_price values of the current row and the three previous rows. Then, for each row in the result set, the rolling average will be calculated based on a different set of four stock_price values. We can see this in the following formula: rolling_average=(stock_pricerow+stock_priceprevious_row+stock_pricerow-2+stock_pricerow-3)/4 Here’s the result of the previous SQL query. Notice that when stock values are extremely high or low, the rolling average takes much less extreme values: date_timestock_valuerolling_average 01/04/2021 17:00100.00-- 01/05/2021 17:00130.00-- 01/06/2021 17:0090.00-- 01/07/2021 17:00105.00106.25 01/08/2021 17:00110.00108.75 01/09/2021 17:00140.00111.25 01/10/2021 17:0087.00110.50 01/11/2021 17:00107.00111.00 01/12/2021 17:00147.00120.25 01/13/2021 17:0092.00108.25 01/14/2021 11:00110.00114.00 01/15/2021 17:00150.00124.75 01/16/2021 17:00155.00126.75 01/17/2021 17:0097.00128.00 01/18/2021 17:00112.00128.50 01/19/2021 17:00112.00119.00 Moving averages are widely used in financial and technical trading, such as in stock price analysis, to examine short- and long-term trends. In the next graph, we can see the stock_price curve in blue and the rolling_average curve in orange Above, we can clearly see that the rolling average has a smoother curve than the stock_price curve. Also, the running average curve shows a small uptrend that we cannot clearly see in the stock_price curve. Before going to the next section, I would like to suggest our Windows Functions SQL Online Course, where you can learn more about SQL’s analytical functions. For more background on moving averages, check out our previous post on What a Moving Average Is and How to Compute it in SQL . Using Rolling Averages to Discover Trends in New Users Many websites use the metric “new registered users” to measure the site’s performance. In this section, we’ll use rolling averages to detect trends based on the daily count of new registered users. Suppose we have a table called user_activity: user_nameactionuser_typedate_time mary1992user_registrationfree2021-08-01 11:23:00 john_sailoruser_registrationfree2021-08-01 17:33:00 mary1992passwd_changefree2021-08-03 01:22:00 florence99user_registrationfree2021-08-03 14:02:00 clair2003user_registrationfree2021-08-04 15:27:00 sailorupgrade_to_premiumpremium2021-08-05 01:18:00 florence99passwd_changefree2021-08-05 02:55:00 andy123user_creationfree2021-08-06 12:25:00 As we saw in our first example, sometimes the data in the table is in the right format to calculate rolling averages. However, in the table user_activity, we need to change the format of the table data so we can work with it. Say we want to obtain the running average of the number of new users registered each day. For this, we need a table with the columns day and registered_users. SQL has a concept called CTEs (common table expressions) that allows us to create a pseudo-table during query execution. We can then consume the CTE in the same query. Here’s an example query with a CTE: WITH users_registered AS ( SELECT date_time::date AS day, COUNT(*) AS registered_users FROM user_activity WHERE action = 'user_registration' GROUP BY 1 ) SELECT day, registered_users, TRUNC(AVG(registered_users) OVER(ORDER BY day ROWS BETWEEN 9 PRECEDING AND CURRENT ROW), 2) AS moving_average_10_days, TRUNC(AVG(registered_users) OVER(ORDER BY day ROWS BETWEEN 2 PRECEDING AND CURRENT ROW), 2) AS moving_average_3_days FROM users_registered; The previous query can be analyzed in two parts. In blue text, we have the CTE that generates a pseudo-table called users_registered; it contains the columns day and registered_users. The second part of the query (in black text) is the calculation of the rolling average. Similarly to the first example, we use the AVG() window function and the clause OVER(ORDER BY day ROWS BETWEEN 9 PRECEDING AND CURRENT ROW). This applies the AVG() function to the current row and the nine rows before it. The query also calculates a running average for three days; the idea is to show both rolling average curves and compare how smooth they are. The result of the previous query includes data for the last 60 days; below is a partial result set: dayregistered_usersmoving_average_10_daysmoving_average_3_days 2021-08-083333.0032.33 2021-08-095936.3039.00 2021-08-106039.0050.66 2021-08-117543.2064,66 2021-08-126746.1067,33 2021-08-136849.7070.00 2021-08-145952.6064,66 2021-08-156555.0064.00 2021-08-166257.3062.00 2021-08-175760.5061.33 2021-08-186763.9062.00 2021-08-196364.3062.33 2021-08-208967.2073.00 The next image shows the curves users_registered, rolling_average_10_days, and rolling_average_3_days. We can see that the curve of rolling_average_10_days (orange line) is smoother than the rolling_average_3_days curve (grey line). Before moving to our last example, I would like to suggest these 8 Best SQL Window Function Articles. And if you frequently use window functions, I suggest my preferred SQL window function cheat sheet, which includes lots of details about window function syntax and usage. The Rolling Average in Economics In our final moving average example, we are going to analyze some economic indicators for a fictitious country. Suppose we have GDP (gross domestic product) time series data for the last 70 years. We want to know the GDP’s annual growth rate in this country and how that rate has progressed. However, in each single year there may be different factors influencing the GDP amount, such as weather, natural disasters, wars, or economic crises. Thus, we will use the rolling average GDP for 10- and 20-year periods to see the overall trend. We have a table called yearly_gdp with the columns year and amount. Below, you can see a subset of the data from 1950 to 1965: yeargdp_amount 19502396516 19511610296 19523711316 19531051886 19541113133 19552873493 19563295602 19574644432 19583312793 19592086353 19604727159 19613551490 19623282716 19633700999 19642260701 19651796435 The following SQL query gets the moving average of the GDP based on the last 10 and 20 years. Once again, we'll use the AVG() window function with the OVER clause to calculate the average for the previous 10 or 20 years. Note that we use ORDER BY to ensure that the records are arranged chronologically by year: SELECT year, gdp_amount, TRUNC(AVG(gdp_amount) OVER(ORDER BY year ROWS BETWEEN 9 PRECEDING AND CURRENT ROW )) AS rolling_average_gdp_10_years, TRUNC(AVG(gdp_amount) OVER(ORDER BY year ROWS BETWEEN 19 PRECEDING AND CURRENT ROW )) AS rolling_average_gdp_20_years FROM yearly_gdp; A partial result set is shown in the next image. For1950 to 1959, we don't have a value for the 10-year rolling average; this is reasonable, since our series began in 1950 and we don't have enough data to do a 10-year average yet. The same occurs for the 20-year running average between 1950 and 1969. yeargdp_amountrolling_average_gdp_10_daysrolling_average_gdp_20_days 19502396516---- 19511610296---- 19523711316---- 19531051886---- 19541113133---- 19552873493---- 19563295602---- 19574644432---- 19583312793---- 195920863532609582-- 196047271592842646-- 196135514903036766-- 196232827162993906-- 196337009993258817-- 196422607013373574-- 196517964353265868-- 196621992313156231-- 196750073403192522-- 196855703323418276-- 1969461463936711043140343 1970209841334082303125438 1971489939835430203289893 1973594386637612793416272 In the next graph, you can see three curves: the gdp_amount curve, the 10-year rolling average curve (which starts in 1960), and the 20-year rolling average curve. Once again, the rolling average is a smoother curve than the original raw-value curve. If we do the math to extract the GDP yearly growth rate from the 10-year rolling average curve, we’ll obtain values from 0–10 percent. However, if we extract the GDP yearly growth rate from the 20-year rolling average curve, we get values from 3–6 percent; the 20-year rolling average curve is smoother than the 10-year curve. Note that in 2000, the GDP had a big increase; however, the 10-year curve shows a small rise, while the 20-year curve maintains the same slope. Finally, I would like to mention a few words about window functions. They are extremely useful when calculating metrics (as we’ve seen) and preparing reports. If you’re not already familiar with them, I suggest the article When Do I Use SQL Window Functions? for further reading. Moving Beyond Rolling Averages in SQL In this article, we demonstrated how to use SQL to compute rolling averages for different data sets. For those people who want to learn more about SQL, I would like to suggest the LearnSQL.com Advanced SQL course. It will help you develop marketable SQL skills as you delve deeper into advanced techniques. What's more, our platform offers many ways to practice advanced SQL online. Tags: sql learn sql window functions