11th Aug 2021 9 minutes read How to Find the Nth-Highest Salary by Department with SQL Tihomir Babic sql learn sql Window Functions Table of Contents The Data I’ll Be Using What Is the Task Here? Using NTH_VALUE() Using ROW_NUMBER() Using RANK() Using DENSE_RANK() Overview of the Concepts and Their Differences Different Ways to Get the SQL Nth Value Learn how to find the nth-highest salary in SQL, and you’ll learn how to get the nth value in any data. In this article, I’ll focus on one problem and give you several solutions to that problem. The problem is stated in the title already: find the nth-highest salary by department using SQL. This may sound too specific. But by learning how to solve this, you’ll be able to find the nth value in any data, not just salaries. You’ll get an idea of how to solve other similar problems you come across. As I said, we will look at just one problem but explore several solutions for it. In fact, I’ll show you four solutions using these window functions: NTH_VALUE() ROW_NUMBER() RANK() DENSE_RANK() To learn about SQL window functions, I recommend our interactive Window Functions course. It contains over 200 hands-on exercises to teach you all there is to know about window functions. I you already know window functions, consult our SQL Window Functions Cheat Sheet. The Data I’ll Be Using We will use data stored in the following two tables: employee department Here’s what the data in the table employee looks like: idfirst_namelast_namesalarydepartment_id 1GoraudTomankiewicz7,231.061 2HarwellWinny8,139.511 3BastienGoosnell4,574.201 4ZachariahRapi6,657.111 5GiustinoCruikshank5,555.631 6AbraClemon5,564.253 7HurleeDrance9,790.162 8RozannaMcIlvoray3,201.182 9IveStrathdee9,300.252 10LoisSkain5,371.022 11DeborHolby2,804.293 10HadrianRobatham2,615.783 13DixSowter6,378.123 14LeslieSandle8,805.703 15DagnyRosier2,041.263 The table department has the following data: iddepartment_name 1Research and Development 2Accounting 3Human Resources What Is the Task Here? Let’s find the third-highest salary by department. This means finding the third-highest value, not overall but within each subset, where a subset has the salaries for a given department. The most helpful tool for doing this is the window functions. So, here’s the first solution using a window function. Using NTH_VALUE() The purpose of the NTH_VALUE() function is to get the value of the nth row in the dataset. Here’s how we can use it to get the third-highest salary by department: SELECT e.first_name, e.last_name, d.department_name, salary, NTH_VALUE (salary, 3) OVER ( PARTITION BY department_name ORDER BY salary DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS third_highest_salary FROM department d JOIN employee e ON d.id = e.department_id; This code first selects some columns from the tables employee and department. To use NTH_VALUE(), we have to specify the column and the value of N. Since we want to get the third-highest salary, the column is salary, and N = 3; hence, we have NTH_VALUE(salary, 3). This will get us the third-highest salary. For a window function to work, we need to use an OVER() clause. Remember, we are looking for the third-highest salary in each department. To do this, we use PARTITION BY to divide the data into subsets by the column department_name from the table department. Since we’re looking for the third-highest salary, we use ORDER BY to make the window functions perform the calculations from the highest salary to the lowest. This is the reason for ORDER BY salary DESC in the code. Also, we need the RANGE clause, in this case, RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. This means the window function will look between the first row and the last row of the partition. Of course, to get all this, we need to join the two tables. Here’s the result: first_namelast_namedepartment_namesalarythird_highest_salary HurleeDranceAccounting9,790.165,371.02 IveStrathdeeAccounting9,300.255,371.02 LoisSkainAccounting5,371.025,371.02 RozannaMcIlvorayAccounting3,201.185,371.02 LeslieSandleHuman Resources8,805.705,564.25 DixSowterHuman Resources6,378.125,564.25 AbraClemonHuman Resources5,564.255,564.25 DeborHolbyHuman Resources2,804.295,564.25 HadrianRobathamHuman Resources2,615.785,564.25 DagnyRosierHuman Resources2,041.265,564.25 HarwellWinnyResearch and Development8,139.516,657.11 GoraudTomankiewiczResearch and Development7,231.066,657.11 ZachariahRapiResearch and Development6,657.116,657.11 GiustinoCruikshankResearch and Development5,555.636,657.11 BastienGoosnellResearch and Development4,574.206,657.11 The result tells us the third-highest salary in Accounting is 5,371.02, which is Lois Skain’s salary. Abra Clemon’s salary (5,564.25) is the third-highest salary in Human Resources. In Research and Development, Zachariah Rapi’s salary is the third highest (6,657.11). Using ROW_NUMBER() The second option for getting the third-highest salary by department is to use ROW_NUMBER(). This window function returns the sequence numbers of the rows in a dataset. If we order the salaries within each department, it'll be easy to pick the third highest. Here’s what the solution looks like: SELECT e.first_name, e.last_name, d.department_name, salary, ROW_NUMBER() OVER ( PARTITION BY d.id ORDER BY salary DESC ) AS salary_rank FROM department d JOIN employee e ON d.id = e.department_id ORDER BY department_name; This solution selects all the same columns as in the previous solution. The difference is we use ROW_NUMBER() here instead of NTH_VALUE(). All criteria in the OVER() clause are the same as in the previous solution. The result looks like this: first_namelast_namedepartment_namesalarysalary_rank HurleeDranceAccounting9,790.161 IveStrathdeeAccounting9,300.252 LoisSkainAccounting5,371.023 RozannaMcIlvorayAccounting3,201.184 LeslieSandleHuman Resources8,805.701 DixSowterHuman Resources6,378.122 AbraClemonHuman Resources5,564.253 DeborHolbyHuman Resources2,804.294 HadrianRobathamHuman Resources2,615.785 DagnyRosierHuman Resources2,041.266 HarwellWinnyResearch and Development8,139.511 GoraudTomankiewiczResearch and Development7,231.062 ZachariahRapiResearch and Development6,657.113 GiustinoCruikshankResearch and Development5,555.634 BastienGoosnellResearch and Development4,574.205 It looks a little different from the result of the previous solution. The last column does not contain the value of the third-highest salary. Instead, it contains a ranking, and we can easily see the effect is the same. The values of the third-highest salary by department are 5,371.02 (Accounting), 5,564.25 (Human Resources), and 6,657.11 (Research and Development), the same as what we got in the first solution. We can also use ROW_NUMBER() in a CTE. If you’re not familiar with CTEs, this article explains what it is. For example, we can write the code below: WITH salaries_ranks AS ( SELECT e.first_name, e.last_name, d.department_name, salary, ROW_NUMBER() OVER (PARTITION BY d.id ORDER BY salary DESC ) AS salary_rank FROM department d JOIN employee e ON d.id = e.department_id ) SELECT * FROM salaries_ranks WHERE salary_rank = 3; We start our CTE using a WITH, followed by the CTE name, salaries_ranks. Then, we write a SELECT statement, which is exactly the same as in the previous code. Now that we’ve written a CTE, we can use it like any other table. We do that in a separate SELECT statement which selects all the columns from the CTE salaries_ranks, showing only the rows where salary_rank = 3. The result from this code is: first_namelast_namedepartment_namesalarysalary_rank ZachariahRapiResearch and Development6,657.113 LoisSkainAccounting5,371.023 AbraClemonHuman Resources5,564.253 This is quite an elegant solution. We get only the necessary data – just three rows showing the third-highest salary for each department. Using RANK() The third option is to use the RANK() function. It is similar to ROW_NUMBER() in that it also ranks the rows within a partition. Similar, but not the same. I’ll talk about the differences later on, or you can read about them here. Let’s write the code using RANK(): SELECT e.first_name, e.last_name, d.department_name, salary, RANK() OVER ( PARTITION BY d.department_name ORDER BY salary DESC ) AS salary_rank FROM department d JOIN employee e ON d.id = e.department_id; Again, this code is not much different from the previous ones except for the specific window function used. Here, it’s RANK(), with all the same criteria in the OVER() clause. The result will be exactly the same as when we used ROW_NUMBER(), so I don’t think there’s any need to show the same result table again. Using DENSE_RANK() The last solution I will show you here is the DENSE_RANK() window function. Just like ROW_NUMBER() and RANK(), it ranks the values within a dataset. The code is not really different, either. It’s the same as before, just using a different window function: SELECT e.first_name, e.last_name, d.department_name, salary, DENSE_RANK() OVER ( PARTITION BY d.department_name ORDER BY salary DESC ) AS salary_rank FROM department d JOIN employee e ON d.id = e.department_id; When we run this code, we get exactly the same result as we get with the previous two solutions. Overview of the Concepts and Their Differences As you have seen, you can use any of these four window functions to get the third-highest salary by department. The NTH_VALUE() function explicitly shows you the value of the third-highest salary by department. The ROW_NUMBER(), RANK(), and DENSE_RANK() functions rank the salaries within each department. Then, you can simply find the salary value associated with rank number 3. These three functions are similar but not the same. They seem identical in this example, but other data may expose their differences. Here’s an explanation of how they are different to save you some unpleasant surprises. The differences surface when you have ties (multiple rows with the same value) in your data. I knew I didn’t have any ties in my data, so I knew all three functions would give me the same solution. But what if you do have ties? In that case, ROW_NUMBER() would sequentially allocate ranks; that is, the ties would have different rank numbers that are arbitrarily assigned. In contrast, RANK() allocates the same rank number to the same values and skips a rank number to compensate for it when it comes to a row with a different value. Finally, DENSE_RANK() allocates the same rank number to ties, without skipping a rank number at the next value. Here’s an example to illustrate: employeesalaryrow_number()rank()dense_rank() 12,000111 24,000333 33,000222 48,000554 54,000433 If you’re looking for the value of the third-highest salary in this data, ROW_NUMBER() will give you the correct solution; the salary of 4,000 is the third-highest. However, it would give you an incorrect solution if you’re looking for all the employees with the third-highest salary. It would show only employee 2 when employee 5 should also be shown. In this case, RANK() or DENSE_RANK() would be a better choice. What if you’re looking for the value of the fourth-highest salary? The ROW_NUMBER() function would give you a totally incorrect solution, since the third- and the fourth-highest values are the same. Using RANK() returns no result at all, because it skips the rank number four – as explained earlier, it allocates the same rank number to ties and skips the next rank number to compensate for it. Only DENSE_RANK() gives you a correct solution in this case. Before deciding which function to use, be sure to understand the data and what you are trying to get as the solution. The safest way is to use NTH_VALUE(). If you want to use a ranking function, it’s generally best to use DENSE_RANK() when you don’t know if there are any ties in the data. If you use ROW_NUMBER() or RANK() instead, make sure you know what the result would look like. Maybe one or both these two functions give you what you need. Choose the function according to your needs. You could also use these four functions when you want to find the maximum or the minimum salary by department. Or, for example, the most or the least expensive product by product category. Or any product with the nth-highest price. You could look for the nth-highest or the nth-lowest sales, revenue, hours worked, costs, number of likes, logins, engagements, streams, comments, etc.; you name it. If you use MySQL, this comprehensive introduction will show you these four and all other window functions in MySQL. Different Ways to Get the SQL Nth Value The NTH_VALUE() function is perfect for finding the nth-highest salary or the nth value of any other column. After all, it was designed exactly for that purpose. I’ve shown you ways to get the same effect with three additional window functions: ROW_NUMBER(), RANK(), and DENSE_RANK(). Use whichever best suits your needs and data. But these are just four window functions. It is just the tip of the iceberg, and there is so much more! These and other window functions are explained in this course on window functions. Tags: sql learn sql Window Functions