Following query would identify the duplicate rows on a table using self join using rowid among duplicates. For faster execution, this is suggested to use unique/primary key values in select and in joinings. If there is no PK/Unique key then its safe to specify all the columns associated with table in joining.
First way:
1. Identify Duplicates
select EMPNO,DEPT_ID, GRADE 
  from SCOTT.EMP a 
where
   rowid >
     (select min(rowid) from SCOTT.EMP b
      where
         b.EMPNO = a.EMPNO
      and
         b.DEPT_ID = a.DEPT_ID
      and b.GRADE = a.GRADE
    );
See the output, do some manual verification and go for deletion with following query.
delete from
   SCOTT.EMP a 
and
   a.rowid >
   any (select b.rowid
   from
      SCOTT.EMP b
   where
         b.EMPNO = a.EMPNO
      and
         b.DEPT_ID = a.DEPT_ID
      and b.GRADE = a.GRADE
   );
Second way:
1. Identify duplicates.
select count(*) from RF_BUSINESS_UNIT_HISTORY WHERE ROWID IN
(select rowid from RF_BUSINESS_UNIT_HISTORY
MINUS
select max(rowid) from RF_BUSINESS_UNIT_HISTORY
GROUP BY
CALENDAR_YEAR_MONTH,BUSINESS_UNIT_NO,BUSINESS_UNIT_NAME);
(select rowid from RF_BUSINESS_UNIT_HISTORY
MINUS
select max(rowid) from RF_BUSINESS_UNIT_HISTORY
GROUP BY
CALENDAR_YEAR_MONTH,BUSINESS_UNIT_NO,BUSINESS_UNIT_NAME);
COUNT(*)
———-
2528
———-
2528
2. 2258 duplicates exist – these can be deleted with command below -
delete from RF_BUSINESS_UNIT_HISTORY WHERE ROWID IN
(select rowid from RF_BUSINESS_UNIT_HISTORY
MINUS
select max(rowid) from RF_BUSINESS_UNIT_HISTORY
GROUP BY
CALENDAR_YEAR_MONTH,BUSINESS_UNIT_NO,BUSINESS_UNIT_NAME);
(select rowid from RF_BUSINESS_UNIT_HISTORY
MINUS
select max(rowid) from RF_BUSINESS_UNIT_HISTORY
GROUP BY
CALENDAR_YEAR_MONTH,BUSINESS_UNIT_NO,BUSINESS_UNIT_NAME);
 
 
No comments:
Post a Comment