27th Apr 2017 5 minutes read NULL Values and the GROUP BY Clause Maria Alcaraz aggregate functions NULL GROUP BY Table of Contents The GROUP BY Clause and NULL Values Aggregate Functions and Null Values The ORDER BY Clause and NULL Values Boolean Expressions Involving NULLS TRY IT YOURSELF! We've already covered how to use the GROUP BY clause and some aggregation functions like SUM(), AVG(), MAX(), MIN(), COUNT(). In this article, we will explain how the GROUP BY clause works when NULL values are involved. We will also explain about using NULLs with the ORDER BY clause. The best way to master GROUP BY and NULL in SQL is through practice. I recommend the SQL Practice track at LearnSQL.com. It contains over 600 hands-on exercises to help you gain confidence in your skills. In SQL, NULL is a special marker used to indicate that a data value does not exist in the database. For more details, check out Wikipedia's explanation of NULL in SQL. We will use the following employee table to illustrate how the GROUP BY clause works with NULL values. EMPLOYEE TABLE EmplidNameDepartmentSalary 100John Smith IT 2000 101Jean Pellu NULL 2500 102Mary Popins FINANCES2000 103Blas MerrieuNULL NULL 104Joan Piquet IT 1000 105Jose Gomez IT NULL The GROUP BY Clause and NULL Values Let's start by executing a simple SQL query with both the GROUP BY clause and NULL values: SELECT department FROM employee GROUP BY department; RESULTS department 1. 2. IT 3. FINANCES Note: I've added a numbered list here for clarity; usually the results would be shown as an unnumbered list. We can see that the first result value is a NULL represented by an empty string (the empty line before the IT department). This empty space represents all the NULL values returned by the GROUP BY clause, so we can conclude that GROUP BY treats NULLs as valid values. In the next query, we will count how many employees are in each department, including the "NULL" department: SELECT department, count(*) FROM employee GROUP BY department; RESULTS department count(*) 1. 2 2. IT 3 3. FINANCES 1 Note: I've added a numbered list here for clarity; usually the results would be shown as an unnumbered list. Analyzing the previous results from a "GROUP BY perspective", we can conclude that all NULL values are grouped into one value or bucket. This makes it look like NULL is one department with two employees. However, treating NULLs this way – grouping many NULLs into one bucket – does not align with the concept that a NULL value is not equal to any other value, even another NULL. To explain why NULLs are grouped into one bucket, we need to review the SQL standard. SQL defines “any two values that are equal to one another, or any two NULLs”, as “not distinct”. This definition of "not distinct" allows SQL to group and sort NULLs when the GROUP BY clause (or other keywords that perform grouping) is used. There is another confusing point in the previous result: the way the NULL is represented (by a blank line) is not clear. One interesting way to solve this issue is to use the COALESCE function, which converts NULLs to a specific value but leaves other values unchanged. Let's see the following query: SELECT coalesce(department,'Unassigned department'), count(*) FROM employee GROUP BY 1; RESULTS department count(*) IT 3 Unassigned department 2 FINANCES 1 Aggregate Functions and Null Values Until now we've been working with the NULL values in the department column, and we've only used the GROUP BY clause. Let's try executing some queries using NULL values as parameters in aggregate functions. First, we'll use the COUNT() function: SELECT COUNT(salary) as "Salaries" FROM employee RESULTS Salaries 4 Without the DISTINCT clause, COUNT(salary) returns the number of records that have non-NULL values (2000, 2500, 2000, 1000) in the salary column. So, we can conclude that COUNT doesn't include NULL values. Let's try using the COUNT(distinct column) aggregate function, which counts all the different values in a column. How does this treat NULL values? SELECT COUNT(distinct salary) as "Different Salaries" FROM employee RESULTS Different Salaries 3 The query returned a "3", but there are four different salaries: 2000, 2500, 1000, and NULL. Again, we can conclude that the NULL is not included in the resulting value. Let's see another example, this time using the AVG() aggregate function: SELECT coalesce(department,'Unassigned department'), AVG(salary) FROM employee GROUP BY 1 RESULTS department count(*) Unassigned department 2500 IT 1500 FINANCES 2000 Let's analyze if NULL values are included in the AVG() function. The IT department has three employees with the following salary values: 2000, 1000, and NULL. The AVG result for IT is 1500, so it is clear that the NULL value is not considered in the average calculation. (Because (1000 + 2000 ) / 2 = 1500.) The conclusion is that averages are only calculated using non-NULL values. The general rule is NULL values are not considered in any aggregate function like SUM(), AVG(), COUNT(), MAX() and MIN(). The exception to this rule is the COUNT(*) function, which counts all rows, even those rows with NULL values. Here's an example: SELECT COUNT(*) as "Total Records" FROM employee RESULTS Total Records 6 As we can see, COUNT(*) returns the total number of records in the "employee" table, even those records with NULL values in some or all fields. The ORDER BY Clause and NULL Values The SQL standard does not explicitly define a default sort order for NULLs. Some databases like Oracle and PostgreSQL use a NULLS FIRST or NULLS LAST specification to indicate the place of the NULL value. The following example shows this feature: SELECT department, COUNT(*) as "Num of employees" , AVG(salary) as "Avg Dept. Salary" FROM employee GROUP BY department ORDER BY department NULLS LAST RESULTS department Num of employees Avg Dept. Salary FINANCES 1 2000 IT 3 1500 2 2500 Boolean Expressions Involving NULLS We normally see TRUE or FALSE as a Boolean result, but it is usual for expressions or conditions that include a NULL to return an UNKNOWN result. The UNKNOWN result is covered in detail in another article we previously published here on our blog. TRY IT YOURSELF! There are many relational database features and functions that produce a specific behavior whenever a NULL value is involved. You can learn more in the LearnSQL's Standard SQL Functions course. Try it out for free! Tags: aggregate functions NULL GROUP BY