10th Nov 2022 10 minutes read How to Use GROUP BY in SQL Tihomir Babic sql GROUP BY Table of Contents How Does GROUP BY Work? An Example Steps to Use GROUP BY Other Tips for Using GROUP BY in SQL Use a Unique Identifier Grouping by Value Two Columns as the Group Definer Grouping by an Expression Steps to Writing a Query With GROUP BY and Aggregate Functions No SQL User Can Avoid Using GROUP BY In this article, we’ll give you a step-by-step guide on using GROUP BY in SQL. Uncover the nuances of the GROUP BY clause and learn different ways of using it. GROUP BY is one of the most-used clauses in SQL. It moves you from simply selecting data from the database to grouping rows with the same column values into one group. When used with SQL’s aggregate functions, you can use GROUP BY to compute metrics like counting the number of instances or finding the total, average, minimum, or maximum value. GROUP BY is part of basic SQL knowledge; you need to feel really comfortable with it before you go on to more complicated concepts. You already know that writing code is best learned through practice. Finding the opportunity to write SQL code regularly might be tricky if you’re not working with SQL daily. Hard, but not impossible. What makes it possible is the SQL Practice Set course. It 88 exercises offer plenty of opportunities to practice GROUP BY and some other SQL concepts, such as aggregate functions, JOINs, and subqueries. How Does GROUP BY Work? An Example I’ll use the table typewriter_products to demonstrate the GROUP BY clause’s importance in SQL. It shows things you can buy from a fictional company selling typewriter-related products. For those who might wonder, this is a typewriter: Source: https://i.etsystatic.com/8429430/r/il/132716/1789166606/il_fullxfull.1789166606_1qnc.jpg Almost like a computer without a screen. Who knows how many important books were written using a typewriter. Books, remember those? (But enough already with the smugness! Talk about the table!) Yes, the table. It has the following columns: id – The record’s unique ID. product_name – The product’s name. product_id – The product’s ID. ribbon_brand – The brand of the typewriter ribbon. typewriter_brand – The brand of typewriter the ribbon is made for. ribbon_color – The typewriter ribbon’s color. units – The product’s number of units available. price – the product’s price per unit. The data itself is shown below. idproduct_nameproduct_idribbon_brandtypewriter_brandribbon_colorunitsprice 1typewriter ribbon1All You NeedOlympiaBlack8810.00 2typewriter ribbon1All You NeedOlympiaBlack + Red4810.00 3typewriter ribbon2Ribbons & UsOlympiaBlack9712.39 4typewriter ribbon2Ribbons & UsOlympiaBlack + Red14715.15 5typewriter ribbon3All You NeedUnderwoodBlack1424.74 6typewriter ribbon3All You NeedUnderwoodBlack + Red1325.17 7typewriter ribbon4Our RibbonUnderwoodBlack5425.00 8typewriter ribbon5Ribbons & UsUnderwoodBlack15730.47 9typewriter ribbon5Ribbons & UsUnderwoodBlack + Red1428.47 10typewriter ribbon6All You NeedAdlerBlack4420.00 11typewriter ribbon6All You NeedAdlerBlack + Red1630.00 12typewriter ribbon7Ribbons & UsAdlerBlack5424.69 13typewriter ribbon7Ribbons & UsAdlerBlack + Red2730.30 With this table, you could get several groups. For example, you can calculate the average price of a product by the typewriter brand. Here’s the query that will do that: SELECT typewriter_brand, AVG(price) AS average_price FROM typewriter_products GROUP BY typewriter_brand; This query groups data by the typewriter brand and calculates the average price. The query will output this table: typewriter_brandaverage_price Adler26.25 Olympia11.89 Underwood26.77 The output shows three types of typewriter brands and the average product price for each typewriter. Or you could find the number of ribbons available by color with this query: SELECT ribbon_color, SUM(units) AS sum_units FROM typewriter_products GROUP BY ribbon_color; The group in this query is the ribbon color. I also use the SUM() aggregate function to sum the number of units per ribbon color. Here’s the result: ribbon_colorsum_units Black508 Black + Red265 Data is grouped into two rows: black ribbon and black + red ribbon. For each ribbon color, there’s a number of units available for sale. This is a sneak peek of what GROUP BY does and how it works. As you can see, the GROUP BY syntax is relatively simple. If you need some more clarification on the GROUP BY syntax, take a look at this article. How can you write your own queries and make use of the GROUP BY clause? Here are some tips to help you. Steps to Use GROUP BY How to use GROUP BY in SQL? I’ll give you a step-by-step recipe in this section. Let’s imagine I want to find the number of records for each product. The first step in writing a query should be finding an adequate grouping column. In this case, it’s product_id. We put this column in the GROUP BY clause: SELECT … GROUP BY product_id; The second step is to choose the right aggregate function and use it in the SELECT statement. Since our goal is to find the number of records, we use the COUNT() function. Of course, you also need to specify the column in the FROM clause: SELECT COUNT(*) FROM typewriter_products GROUP BY product_id; Will this query run? Of course! It will return this output. count 2 2 1 2 2 2 2 As you can see, this is not very helpful. All the products have two records except one, but what are these products? We don’t have a clue! That’s why the third step is important. In this step, write the grouping column in the SELECT, too: SELECT product_id, COUNT(*) FROM typewriter_products GROUP BY product_id ORDER BY product_id; Now, the query will show this output. product_idcount 12 22 32 41 52 62 72 This is much more helpful, right? Now you know that the product with ID 4 has only one occurrence in the table. You probably noticed I used COUNT(*) in the above queries. This is not the only way of using that aggregate function. You can also learn about all the options for using COUNT(). By comparing the two queries above, you can see it’s possible to use a column in the GROUP BY but not use it in SELECT; the grouping still works. And what if you used the column in the SELECT statement but not in the GROUP BY? No, you can’t do that. The general rule is: If the column is in SELECT and is not used in an aggregate function, it must be listed in the GROUP BY clause. Here are some more details about this rule. Other Tips for Using GROUP BY in SQL While using GROUP BY seems to be rather easy (and it is!), there are some other tips and tricks that will make using it a much more comfortable experience. Use a Unique Identifier When choosing which column to group by, you should generally use the column that uniquely identifies the group. If you don’t do that, the result you get could be misleading or just plain wrong. For example, let’s show the products by typewriter brand but try grouping the data by product name. The query … SELECT product_name, typewriter_brand, COUNT(*) AS product_count FROM typewriter_products GROUP BY product_name, typewriter_brand ORDER BY typewriter_brand; … will output the following table: product_nametypewriter_brandproduct_count typewriter ribbonAdler4 typewriter ribbonOlympia4 typewriter ribbonUnderwood5 If you grouped data this way, you would get the wrong impression there’s only one product for each of the three typewriter brands – one product appearing four times for Adler and Olympia typewriters and five times for Underwood. What if you used the column product_id instead of product_name? SELECT product_id, typewriter_brand, COUNT(*) AS product_count FROM typewriter_products GROUP BY product_id, typewriter_brand ORDER BY typewriter_brand; Now you see the result is a little different: product_idtypewriter_brandproduct_count 6Adler2 7Adler2 1Olympia2 2Olympia2 3Underwood2 4Underwood1 5Underwood2 There are still products for three typewriter brands. However, now you know there are two products for Adler and Olympia, both appearing twice for each typewriter. There are three products for Underwood. Not only that, but you also know which products they are. Let’s look at this example: SELECT product_id, product_name, typewriter_brand, COUNT(*) AS product_count FROM typewriter_products GROUP BY product_id, product_name, typewriter_brand ORDER BY typewriter_brand; This is an extended version of the previous query. We again group by product_id. To get the number of product occurrences, we use the COUNT() function. We also want more labels, so we add the product name and the typewriter brand to the SELECT. Since these columns appear in the SELECT, they must also appear in GROUP BY. Note that all the columns do so except the one with the aggregate function. Have a look at the output: product_idproduct_nametypewriter_brandproduct_count 6typewriter ribbonAdler2 7typewriter ribbonAdler2 1typewriter ribbonOlympia2 2typewriter ribbonOlympia2 3typewriter ribbonUnderwood2 4typewriter ribbonUnderwood1 5typewriter ribbonUnderwood2 You can see this output gives the clearest picture. There are seven different products, but they are all typewriter ribbons. These products are for three typewriter brands. All products have two occurrences, except product #4. Grouping by Value You don’t always have to group by the ID column. It’s also possible to group data by value. For example, if you wanted to know how many units there are by the ribbon color, you’d use this query: SELECT ribbon_color, SUM(units) AS units_sum FROM typewriter_products GROUP BY ribbon_color; Here we use ribbon color as the grouping criteria. The aggregate function SUM() will total the units in the following way: ribbon_colorunits_sum Black508 Black + Red265 The output shows there are 508 black ribbons and 265 black and red ribbons. If you’re not familiar with summing the values, check out the SUM() function explained. The MIN() & MAX() functions are also commonly used with GROUP BY. Two Columns as the Group Definer The GROUP BY clause allows you to group data by two or more columns; you already saw that. But it’s also possible to use two columns as the unique identifier of a group. For example: SELECT ribbon_color, typewriter_brand, SUM(units) AS units_sum FROM typewriter_products GROUP BY ribbon_color, typewriter_brand ORDER BY typewriter_brand; In this query, we use the columns ribbon_color and typewriter_brand to define a group. Each ribbon color for a specific typewriter will be shown only once. For such a defined group, we calculate the number of ribbons available: ribbon_colortypewriter_brandunits_sum BlackAdler98 Black + RedAdler43 BlackOlympia185 Black + RedOlympia195 BlackUnderwood225 Black + RedUnderwood27 Grouping by an Expression Instead of grouping only by columns, it’s also possible to group data by an expression. Let’s take a look at the following query: SELECT ribbon_brand, ribbon_color, units*price AS product_value, SUM(units*price) AS product_value_sum FROM typewriter_products GROUP BY ribbon_brand, ribbon_color, units*price ORDER BY ribbon_brand; Here we select the ribbon brand and color. Additionally, we also calculate the value of each group by multiplying the unit by the price. Then we group data by the columns ribbon_brand and ribbon_color. This is something you’re used to. But we also add the formula for calculating the value to the GROUP BY clause. We want to show only unique values by ribbon brand and color. If there are several of the same calculated values within the same group, they will be shown as one row. To make the aggregation more obvious, I added the sum of product values. You’ll soon see why. ribbon_brandribbon_colorproduct_valueproduct_value_sum All You NeedBlack346.36346.36 All You NeedBlack880.001,760.00 All You NeedBlack + Red327.21327.21 All You NeedBlack + Red480.00960.00 Our RibbonBlack1,350.001,350.00 Ribbons & UsBlack1,201.831,201.83 Ribbons & UsBlack1,333.261,333.26 Ribbons & UsBlack4,783.794,783.79 Ribbons & UsBlack + Red398.58398.58 Ribbons & UsBlack + Red818.10818.10 Ribbons & UsBlack + Red2,227.052,227.05 It might look like this table isn’t grouped at all. But let’s take a closer look. If the data weren’t grouped, the brand All You Need would have appeared six times, the same as Ribbons & Us. But it appears only four times. Why? Because the values 880.00 and 480.00 appear two times each, so they’re grouped together. This is what the column product_value_sum tells you in the colored rows. These are the only rows where this column is different from product_value. The green row has a sum of 1,760.00 because value 880.00 appears twice. The red row’s sum is 960.00 because 480.00 appears twice. Steps to Writing a Query With GROUP BY and Aggregate Functions You’ve already internalized GROUP BY through writing all the above queries. But I think it’s worth having this as a separate list – the steps checklist. Step 1: Identify the grouping column(s), i.e. a column or columns you want to group the data by. After you’ve identified it, put it in the GROUP BY clause. Step 2: Depending on the metric you want to calculate, choose the appropriate aggregate function and use it in the SELECT statement. Step 3: Use the grouping column in the SELECT statement. That way, you’ll get data labels for each group, not only the aggregate function output. No SQL User Can Avoid Using GROUP BY No SQL user known to me has managed to live a successful SQL life and write queries without GROUP BY. It’s impossible! So why not practice GROUP BY and other SQL expressions and functions? The wise choice would be our SQL Practice Set. It offers you the opportunity to practice all the different ways of using GROUP BY covered in the article. Some other useful resources for GROUP BY examples are available on our blog; the concept also often appears in SQL job interview questions. Tags: sql GROUP BY