Following query would identify the duplicate rows on a table using self join using rowid among duplicates. For faster execution, this is suggested to use unique/primary key values in select and in joinings. If there is no PK/Unique key then its safe to specify all the columns associated with table in joining.
First way:
1. Identify Duplicates
select EMPNO,DEPT_ID, GRADE
from SCOTT.EMP a
where
rowid >
(select min(rowid) from SCOTT.EMP b
where
b.EMPNO = a.EMPNO
and
b.DEPT_ID = a.DEPT_ID
and b.GRADE = a.GRADE
);
See the output, do some manual verification and go for deletion with following query.
delete from
SCOTT.EMP a
and
a.rowid >
any (select b.rowid
from
SCOTT.EMP b
where
b.EMPNO = a.EMPNO
and
b.DEPT_ID = a.DEPT_ID
and b.GRADE = a.GRADE
);
Second way:
1. Identify duplicates.
select count(*) from RF_BUSINESS_UNIT_HISTORY WHERE ROWID IN
(select rowid from RF_BUSINESS_UNIT_HISTORY
MINUS
select max(rowid) from RF_BUSINESS_UNIT_HISTORY
GROUP BY
CALENDAR_YEAR_MONTH,BUSINESS_UNIT_NO,BUSINESS_UNIT_NAME);
(select rowid from RF_BUSINESS_UNIT_HISTORY
MINUS
select max(rowid) from RF_BUSINESS_UNIT_HISTORY
GROUP BY
CALENDAR_YEAR_MONTH,BUSINESS_UNIT_NO,BUSINESS_UNIT_NAME);
COUNT(*)
———-
2528
———-
2528
2. 2258 duplicates exist – these can be deleted with command below -
delete from RF_BUSINESS_UNIT_HISTORY WHERE ROWID IN
(select rowid from RF_BUSINESS_UNIT_HISTORY
MINUS
select max(rowid) from RF_BUSINESS_UNIT_HISTORY
GROUP BY
CALENDAR_YEAR_MONTH,BUSINESS_UNIT_NO,BUSINESS_UNIT_NAME);
(select rowid from RF_BUSINESS_UNIT_HISTORY
MINUS
select max(rowid) from RF_BUSINESS_UNIT_HISTORY
GROUP BY
CALENDAR_YEAR_MONTH,BUSINESS_UNIT_NO,BUSINESS_UNIT_NAME);
No comments:
Post a Comment