Friday, August 06, 2010

Delete duplicate rows from a table



Following query would identify the duplicate rows on a table using self join using rowid among duplicates. For faster execution, this is suggested to use unique/primary key values in select and in joinings. If there is no PK/Unique key then its safe to specify all the columns associated with table in joining.

First way:
1. Identify Duplicates
select EMPNO,DEPT_ID, GRADE
  from SCOTT.EMP a
where
   rowid >
     (select min(rowid) from SCOTT.EMP b
      where
         b.EMPNO = a.EMPNO
      and
         b.DEPT_ID = a.DEPT_ID
      and b.GRADE = a.GRADE
    );

2. Delete Duplicates
See the output, do some manual verification and go for deletion with following query.

delete from
   SCOTT.EMP a
and
   a.rowid >
   any (select b.rowid
   from
      SCOTT.EMP b
   where
         b.EMPNO = a.EMPNO
      and
         b.DEPT_ID = a.DEPT_ID
      and b.GRADE = a.GRADE
   );


Second way:

1. Identify duplicates.

select count(*) from RF_BUSINESS_UNIT_HISTORY WHERE ROWID IN
(select rowid from RF_BUSINESS_UNIT_HISTORY
MINUS
select max(rowid) from RF_BUSINESS_UNIT_HISTORY
GROUP BY
CALENDAR_YEAR_MONTH,BUSINESS_UNIT_NO,BUSINESS_UNIT_NAME);
COUNT(*)
———-
2528

2. 2258 duplicates exist – these can be deleted with command below -

delete from RF_BUSINESS_UNIT_HISTORY WHERE ROWID IN
(select rowid from RF_BUSINESS_UNIT_HISTORY
MINUS
select max(rowid) from RF_BUSINESS_UNIT_HISTORY
GROUP BY
CALENDAR_YEAR_MONTH,BUSINESS_UNIT_NO,BUSINESS_UNIT_NAME);



No comments: