27th Feb 2024 10 minutes read 6 Examples of NTILE() Function in SQL Ignacio L. Bisso sql learn sql Table of Contents What Are SQL Window Functions? Example Dataset: A Soccer Store Example #1: Dividing Rows into 2 Groups Example #2: Distribution of 2023 Sales Example #3: Monthly Sales for Each Category and Team Example #4: Teams’ Lowest and Highest Sales Example #5: Christmas Sales Behavior by Soccer Team Example #6: Obtain Evenly Distributed Groups of Customers Continue Learning About NTILE() and Other SQL Window Functions The SQL NTILE() function can greatly simplify your data analysis and reporting. Follow these six examples to learn what it is and when to use it. Window functions are very useful, providing great expressive power to SQL. In this article, we discuss the NTILE() function, which allows us to divide a set of records into subsets of approximately equal size. This function is widely used in financial or economic calculations. Before we get into the NTILE() function, let’s quickly review window functions in SQL. What Are SQL Window Functions? Window functions are functions that work on a group of rows called a window; they return a value based on that group. The NTILE(N) function receives an integer parameter (N) and divides the complete set of rows into N subsets. Each subset has approximately the same number of rows and is identified by a number between 1 and N. This ID number is what NTILE() returns. If you want to learn SQL window functions, I recommend our interactive course. It contains 218 hands-on exercises to help you practice different window functions. At the end, you'll feel comfortable using this advanced SQL technique. If you want to see window functions in action, check out our article SQL Window Function Example with Explanations. And if you want to explore the differences between GROUP BY and window functions, read SQL Window Functions vs. GROUP BY. Example Dataset: A Soccer Store In the rest of the article, we will base our examples on the database of a fictional soccer store. We’ll focus on the sales table, which has the following structure and data: customer_idsale_dateamountproduct_idproduct_categorysoccer team 1142024-01-2720.001083AccessoryRiver 1302023-12-18150.001002ShirtBarcelona 1192023-12-0115.001002AccessoryBarcelona 1072023-12-23145.001011ShirtManchester 1042023-12-1210.001003AccessoryPSG 1352023-12-24185.001002ShirtBarcelona 1232023-12-24135.001012ShirtBarcelona 1132023-12-24110.001022ShirtBarcelona I think everything in this table is self-explanatory, so let’s move on to the examples. Example #1: Dividing Rows into 2 Groups We will start with a very simple example. We want to divide the rows in the sales table into two groups: group #1 and group #2. Then the following query does that: SELECT NTILE(2) OVER() AS group, sale_date, product_id, soccer_team FROM sales; In the query, the NTILE(2) OVER() expression returns 1 for the first 50% of the rows in the result set and 2 for the second 50% of the rows. The rows are assigned to each group in a non-deterministic way – i.e. there aren’t any criteria to assign rows to any particular group. Here’s a partial result of the query, showing each group in a different color: groupsale_dateproduct_idsoccer_team 12024-01-121083River Plate 12023-12-181002Barcelona … 12023-12-011002Barcelona 12023-12-231011Manchester 22023-12-121003PSG 22023-12-241002Barcelona … 22023-12-241012Barcelona 22023-12-241022Barcelona Example #2: Distribution of 2023 Sales The table sales stores records for each completed sale. The marketing department is interested in analyzing the sales distribution based on the amount spent. They asked for a report grouping all the sales of the year 2023 into four groups of the same size (the number of sales in each group must be the same). Each sale must be assigned based on the sale amount. The first group (sale_group #1) should have the sales with the lowest amount, and the last group (sale_group #4) should have sales with the highest amount. For each sale, the report should include the sale group number, the customer_id, the product_id, and the soccer_team. The query to obtain this result is the following: SELECT NTILE(4) OVER ( ORDER BY amount ) AS sale_group, product_id, product_category, soccer_team, amount as sales_amount FROM sales WHERE sale_date >= '2023-12-01' AND sale_date <= '2023-12-31'; This query uses a WHERE clause to filter for sales that occurred in 2023. Then the NTILE(4) function tries to create four groups of rows with the same number of rows each. We use “tries” because it isn’t always possible to create groups of the same size; some groups may have one row less than the others. How do we define which group each row belongs to? The OVER (ORDER BY amount) clause indicates that, before rows are assigned groups, all rows must be ordered based on the sale amount. Once sorted, the NTILE(4) function will take the first quarter of the sales and assign them the value 1, then take the next quarter of the sales and assign them the value 2, and so on. Below is a partial result showing records in each sale_group: sale_groupproduct_idproduct_categorysoccer_teamsale_amount 11003AccessoryPSG10.00 11002AccessoryBarcelona15.00 … 21083AccessoryRiver20.00 21022ShirtBarcelona110.00 … 31012ShirtBarcelona135.00 31011ShirtManchester145.00 … 41002ShirtBarcelona150.00 41002ShirtBarcelona185.00 … We can see that in the cheapest group (1), we only have sales from the Accessories category. This is because accessory products are usually cheapest, while shirts are usually more expensive. We can also see that products for Barcelona are in all four sale groups, which suggests that this team has an offer at all price points. Example #3: Monthly Sales for Each Category and Team In the next query, we will create a report on monthly sales. The marketing department wants to divide monthly sales for each product category and team into four groups. The first group will have the product categories, soccer teams, and months with the lowest total sales. The next group will have the next level of total_sales, and so on. This way, the marketers will be able to analyze the distribution of sales in different months and categories. Below is the query: WITH monthly_sales_stats AS ( SELECT EXTRACT(MONTH FROM sale_date) as month, product_category, soccer_team, SUM(amount) AS total_sales FROM sales WHERE sale_date >= '2023-01-01' and sale_date <= '2023-12-31' GROUP BY EXTRACT(MONTH FROM sale_date), product_category, soccer_team ) SELECT NTILE(4) OVER ( ORDER BY total_sales ) AS group, month, product_category, soccer_team, total_sales FROM monthly_sales_stats ORDER BY group DESC, total_sales DESC; The query has a common table expression (CTE) called monthly_sales_stats, which calculates the total sales for each combination of month, product_category, and soccer_team. Note that we filter the rows of the year 2023 in the WHERE clause. The expression EXTRACT(MONTH FROM sale_date) is used to obtain the month value in the field sale_date. In the main query, we call NTILE(4) OVER (ORDER BY total_sales) to assign to each row of the CTE a group number. Below is a partial result of the query showing the first three rows of each group: groupmonthproduct_categorysoccer_teamtotal_sales 412ShirtBarcelona1158.00 49ShirtReal Madrid755.00 412ShirtManchester433.00 … 34ShirtReal Madrid225.00 312ShirtRiver220.00 33ShirtBarcelona210.00 … 22ShirtBarcelona115.00 22ShirtReal Madrid105.00 26ShirtRiver100.00 … 111AccessoryBarcelona30.00 16AccessoryReal Madrid30.00 19AccessoryBarcelona25.00 … In the result, you can see the highest sales for each soccer team are in December, probably because of Christmas shopping. The category “Accessory” is at the end of the result table, because accessories are usually lower in price. Example #4: Teams’ Lowest and Highest Sales Like other window functions, you can use NTILE() with the PARTITION BY clause. Here’s an example. The marketing team wants to investigate how sales are distributed within articles for each soccer team. The idea is to split each team’s sales into sets based on amount. Once again, we will put the lowest sales in the first set and then work up to the highest sales in the fourth set. This is the query: SELECT soccer_team, NTILE(4) OVER (PARTITION BY soccer_team ORDER BY amount) AS group_number, product_id, product_category, amount FROM sales ; The query is very simple; it has a SELECT with a list of columns and a FROM with a table name. The NTILE() function uses PARTITION BY and ORDER BY. The PARTITION BY soccer_team puts all the rows with the same value in soccer_team in the same partition. The ORDER BY amount orders the records in each team’s set, putting the ones with the lowest values first. Then NTILE(4) returns 1 for the first 25% of the rows in the group of rows, 2 for the second 25% of the rows in the group, and so on. Below are some partial results: soccer_teamgroup_numberproduct_idproduct_categoryamount Barcelona11028Accessory10.00 Barcelona11027Accessory15.00 Barcelona11002Accessory15.00 Barcelona11025Accessory20.00 …… Barcelona21022Shirt100.00 Barcelona21023Shirt110.00 Barcelona21024Shirt115.00 Barcelona21023Shirt115.00 … Barcelona31035Shirt115.00 Barcelona31032Shirt120.00 Barcelona31036Shirt120.00 Barcelona31026Shirt128.00 … Barcelona41002Shirt150.00 Barcelona41004Shirt155.00 Barcelona41012Shirt170.00 Barcelona41013Shirt185.00 … Manchester11028Accessory20.00 Manchester11025Accessory20.00 Manchester11024Accessory25.00 … Manchester21022Shirt105.00 Manchester21032Shirt110.00 Manchester21035Shirt110.00 … Manchester31024Shirt115.00 Manchester31022Shirt115.00 Manchester31023Shirt118.00 … Manchester41033Shirt120.00 Manchester41011Shirt145.00 Manchester41012Shirt178.00 … Example #5: Christmas Sales Behavior by Soccer Team This example is very similar to Example 3, with the difference that the NTILE() function uses a PARTITION BY soccer_team subclause. This means that NTILE() will create groups of rows for each soccer_team instead of creating groups from the complete result set (as in Example 3). As a result, each soccer team will have four sets. The query is: WITH monthly_sales_stats AS ( SELECT EXTRACT(MONTH FROM sale_date) as month, product_category, soccer_team, SUM(amount) AS total_sales FROM sales WHERE sale_date >= '2023-01-01' and sale_date <= '2023-12-31' GROUP BY EXTRACT(MONTH FROM sale_date), product_category, soccer_team ) SELECT NTILE(4) OVER(PARTITION BY soccer_team ORDER BY total_sales DESC) AS group, month, product_category, soccer_team, total_sales FROM monthly_sales_stats ORDER BY total_sales DESC; The CTE monthly_sales_stats is exactly the same as in the previous example. It has a GROUP BY clause that helps calculate the total amount of sales for each combination of month, product_category, and soccer_team. After creating the CTE, we write a SELECT with the following NTILE() expression: NTILE(4) OVER(PARTITION BY soccer_team ORDER BY total_sales DESC) AS group This expression splits the CTE rows into sets that have the same value in the soccer_team field. For each soccer team set, NTILE(4) tries to create four equal subsets, assigning each row a value from 1 to 4. The rest of the columns in the result come from the CTE. Below are partial results showing two rows for each group for the Barcelona and Manchester teams. groupmonthproduct_categorysoccer_teamtotal_sales 112ShirtBarcelona1158.00 16ShirtBarcelona360.00 … 23ShirtBarcelona340.00 27ShirtBarcelona225.00 … 310ShirtBarcelona115.00 31ShirtBarcelona115.00 …115.00 49AccessoryBarcelona25.00 410AccessoryBarcelona20.00 … 112ShirtManchester433.00 16ShirtManchester340.00 … 24ShirtManchester210.00 29ShirtManchester155.00 … 35ShirtManchester120.00 39ShirtManchester115.00 … 43AccessoryManchester30.00 411AccessoryManchester30.00 … Example #6: Obtain Evenly Distributed Groups of Customers Let’s suppose the marketing department wants to create three evenly distributed groups of customers to run three different marketing campaigns; each campaign will be targeted to one group. The customers in each group are chosen randomly. Then the marketing department will compare campaign results and evaluate which campaign is better. To simulate a random selection of customers, one idea is to use the seconds of the last time each customer bought something. And ordered by the seconds, we will create three groups of customers. Let’s see the query: WITH customer_last_transaction_timestamp AS ( SELECT customer_id, max(sales_date) AS last_ts FROM sales GROUP BY customer_id ) SELECT NTILE(3) OVER (ORDER BY EXTRACT(SECOND FROM last_ts)) AS group_number, customer_id FROM customer_last_transaction_timestamp ORDER BY group_number; The previous query returns all the customers with a group_number from 1 to 3; this represents the marketing group to which the customer was assigned. The CTE customer_last_transaction_timestamp stores every customer with their last transaction timestamp (obtained with MAX(sales_date)). The main query uses the NTILE(3) function to create three customer groups of approximately the same size: NTILE(3) OVER (ORDER BY EXTRACT(SECOND FROM last_ts)) The above statement returns 1 for the 33% of the customers with a timestamp value in the lower third. It returns 2 for the next 33 percent, and so on. The subexpression EXTRACT(SECOND FROM last_ts) takes the seconds part (i.e. 22) from a timestamp (ie. ‘2023-03-30 10:30:22’). Below is a partial result: group_numbercustomer_id 1111 1135 2123 2154 3108 3104 Continue Learning About NTILE() and Other SQL Window Functions In this article, we showed several examples of using the NTILE() window function. We also demonstrated different OVER clauses. The OVER clause is common to all SQL’s window functions. If you want to get some hands-on experience with these commands, I suggest our interactive Window Functions course. If you are a frequent user of window functions our free SQL Window Functions Cheat Sheet is a very useful resource. In fact, I have it on the wall of my office, ready to use when I have doubts about syntax. I strongly recommend it. Finally, the article Top 10 SQL Window Functions Interview Questions is great if you have a job interview and want to be prepared for SQL topics. Happy learning, and keep progressing with SQL window functions! Tags: sql learn sql