h1

SQL SERVER: Find Duplicate Rows with Row_Number

12/31/2008

This is very known question to find Duplicate Rows from the Table and Delete Duplicate rows from the table

I came to know about “ROW_NUMBER()” of SQL SERVER 2005.

Lets see how we can use this feature:

There is one Table:

CREATE TABLE FindDuplicates(
EmpID INT IDENTITY(1,1),
EmpName VARCHAR(500)
)

INSERT some data into the table:

INSERT INTO FindDuplicates VALUES('A')
INSERT INTO FindDuplicates VALUES('A')
INSERT INTO FindDuplicates VALUES('B')
INSERT INTO FindDuplicates VALUES('C')
INSERT INTO FindDuplicates VALUES('C')
INSERT INTO FindDuplicates VALUES('D')
INSERT INTO FindDuplicates VALUES('D')

Now we have data like:

Find Duplicate Rows

Now, we need to Find the duplicate Rows from the table. So we need to get “A”, “C” and “D”.

Lets see the use of Row_Number():

;With CTE AS(
SELECT ROW_NUMBER() OVER(PARTITION BY EmpName ORDER BY EmpName) As RowID,
EmpID,
EmpName
FROM FindDuplicates
)
SELECT *
FROM CTE
WHERE RowID > 1

Output will be like:

Find Duplicate Rows

So we can Delete Duplicate Rows as we have EmpID (Primary Key)

I Used:

PARTITION BY: which will find same rows and assign ID accordingly.

ORDER BY: In which order we want to assign ID. By Default its Ascending

About these ads

7 comments

  1. Fantastic! If I know that some days ago … :(

    Thank you :)


  2. Your query won’t work if you had more than two records with a duplicate EmpName value.

    Standard grouping syntax would probably work as well…

    SELECT MIN (EmpID) AS EmpID, EmpName
    FROM FindDuplicates
    GROUP BY EmpName

    While you don’t get the rank number, you really don’t need it in the result set. If you don’t want the first record as the query above, use MAX instead for the EmpID.

    Cheers,

    Ben


  3. Hi benstaylor,

    If I have more than two records, then it will give me both the duplicate Emp Name.

    Lets say now I insert like this:

    INSERT INTO FindDuplicates VALUES(‘A’)
    INSERT INTO FindDuplicates VALUES(‘A’)
    INSERT INTO FindDuplicates VALUES(‘A’)
    INSERT INTO FindDuplicates VALUES(‘B’)
    INSERT INTO FindDuplicates VALUES(‘C’)
    INSERT INTO FindDuplicates VALUES(‘C’)
    INSERT INTO FindDuplicates VALUES(‘C’)
    INSERT INTO FindDuplicates VALUES(‘D’)
    INSERT INTO FindDuplicates VALUES(‘D’)

    this query will return me
    RowID EmpID EmpName
    2 2 A
    3 3 A
    2 6 C
    3 9 C
    2 8 D

    Could you let me know what is the problem?

    Thanks,

    Tejas


  4. very nice!


  5. Also see here how to use row_number() function for various purposes

    http://beyondrelational.com/blogs/madhivanan/archive/2007/08/27/multipurpose-row-number-function.aspx

    Madhivanan


  6. Hi,

    I am in urgent need of a solution. My problem is, i need to retrieve the data which has duplicate records and need to have the row number start from 1 to the number of records.

    Ex:

    ProductID Amount RowNumber
    1 200.00 1
    1 330.00 2
    2 400.00 1
    2 600.00 2
    2 350.00 3
    3 200.00 1
    4 330.00 1
    5 200.00 1
    5 330.00 2

    Please help me in solving this.

    Thanks,
    Niranjan


  7. very helpfull



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: