How to Find Duplicate Rows in SQL?
Database:
Operators:
Table of Contents
Problem:
You have duplicate rows in your table, with only the IDs being unique. How do you find those duplicate entries?
Example:
Our database has a table named product
with data in the following columns: id
, name
, and category
.
id | name | category |
---|---|---|
1 | steak | meat |
2 | cake | sweets |
3 | steak | meat |
4 | pork | meat |
5 | cake | sweets |
6 | cake | sweets |
Let’s find duplicate names and categories of products. You can find duplicates by grouping rows, using the COUNT
aggregate function, and specifying a HAVING
clause with which to filter rows.
Solution:
SELECT name , category, FROM product GROUP BY name , category HAVING COUNT (id) > 1; |
This query returns only duplicate records—ones that have the same product name and category:
name | category |
---|---|
steak | meat |
cake | sweets |
There are two duplicate products in our table: steak from the meat category and cake from the sweets category. The first product is repeated two times in the table, while the second appears three times.
Discussion:
To select duplicate values, you need to create groups of rows with the same values and then select the groups with counts greater than one. You can achieve that by using GROUP BY
and a HAVING
clause.
The first step is to create groups of records with the same values in all non-ID columns (in our example, name
and category
). You do this with a GROUP BY
clause. After the GROUP BY
keyword, you put the names of the columns you want to use for grouping. We exclude the id
column because it’s our table’s primary key; by definition, each row will have a different value under that column. If we were to include it, then we would not be able to detect duplicates!
We want to find groups with more than one row; such groups must contain a duplicate by definition, as long as we’ve grouped on the correct columns. To do this, we use a HAVING
clause. The condition we specify is that the number of elements in the group—COUNT(id)
—must be greater than one: COUNT(id) > 1
. Remember that HAVING
allows you to filter groups; WHERE
is for filtering individual rows.