Thursday, June 10, 2010

Repair corrupt oracle blocks in 11g

Oracle has introduced a new feature in 11g which can be used to get assistance in case of loss or corruption of datafiles, redo log files or controlfiles. This tool is called data recovery advisor and is accessible from the command line as well as from the GUI.

The command line interface is implemented as new commands for RMAN, the graphical interface is integrated into the Enterprise Manager. In the following little demo I want to show how we can deal with corrupted blocks in11g.

I proactively check for block corruption in the database on a regular basis and strongly recommend this to my customers.

By default RMAN checks for physically corrupted blocks automatically with every backup it creates, no matter if it is a backup set or an image copy.

Logical corruption is only checked with the additional syntax CHECK LOGICAL. The VALIDATE keyword causes that no physical backup is created, but RMAN only reads all blocks through the input puffers but does not write them to the output buffer into a backup set.
RMAN> backup validate check logical database;

Starting backup at 03-FEB-08
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=131 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/u01/app/oracle/oradata/orcl/system01.dbf
input datafile file number=00002 name=/u01/app/oracle/oradata/orcl/sysaux01.dbf
input datafile file number=00003 name=/u01/app/oracle/oradata/orcl/undotbs01.dbf
input datafile file number=00005 name=/u01/app/oracle/oradata/orcl/ORCL/datafile/o1_mf_encrypt__3sd9vvs5_.dbf
input datafile file number=00004 name=/u01/app/oracle/oradata/orcl/users01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:01:26
List of Datafiles
. . .

=================



In 11g RMAN directly informs us about corrupted blocks it has found in the output of backup command and creates a trace file in the Automatic Diagnostic Repository (ADR) :

. . .
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1 OK 0 11128 76800 622989
File Name: /u01/app/oracle/oradata/orcl/system01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 54455
Index 0 9080
Other 0 2137

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2 OK 0 20756 59896 622994
File Name: /u01/app/oracle/oradata/orcl/sysaux01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 10864
Index 0 9779
Other 0 18497

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3 OK 0 0 40960 622994
File Name: /u01/app/oracle/oradata/orcl/undotbs01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 0
Index 0 0
Other 0 40960

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4 OK 0 632 640 11608
File Name: /u01/app/oracle/oradata/orcl/users01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 0
Index 0 0
Other 0 8

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
5 FAILED 0 523 1280 602803
File Name: /u01/app/oracle/oradata/orcl/ORCL/datafile/o1_mf_encrypt__3sd9vvs5_.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 708
Index 0 1
Other 4 48

validate found one or more corrupt blocks
See trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_10321.trc for details
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type Status Blocks Failing Blocks Examined
------------ ------ -------------- -----
SPFILE OK 0 2
Control File OK 0 594
Finished backup at 03-FEB-08


This already is a great improvement because in pre 11g Releases we would have had to find out yourself if there were corrupted blocks found by selecting the information in the dynamic view v$database_block_corruption.

A check like this which has returned corrupted blocks in 11g immediately creates a failure in the database. A failure could also be produces by an inaccessible datafile or a lost control- or redo logfile.

There are multiple checkers available in 11g which are automatically run by the server as soon as a critical error is raised. These include health checks like database structure checks, integrity checks for the data dictionary as well as block corruption check.

These checks can also be started manually by the DBA by using the new package DBMS_HM .

Also the DBA can run these checks from the ADVISER CENTRAL in OEM. There is a new checkers-pane next to the advisers.

Once a failure has been detected it is possible to ask RMAN for possible repair options.

RMAN has the following set of new commands to deal with failures:



- LIST FAILURE

- ADVISE FAILURE

- CHANGE FAILURE

- REPAIR FAILURE



Let’s first have a look at the LIST FAILURE command:

RMAN> list failure;

List of Database Failures
=========================

Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- -------
21821 HIGH OPEN 03-FEB-08 Datafile 5: '/u01/app/oracle/oradata/orcl/ORCL/datafile/o1_mf_encrypt__3sd9vvs5_.dbf' contains one or more corrupt blocks

Now the question for the DBA is



Á “What is the fastest way
to get the problem solved?”

And it might not always be easy to give the correct answer:

It could be that it is faster to restore the file(s) and recover it (them). This would be the case if very many blocks need media recovery.

Or it could be faster to get the list of corrupted blocks and use block media recovery with RMAN to resolve the situation.

As of 11g RMAN can also choose to use block copies from the flashback logs instead of a backup for block media recovery.

The ADVISE FAILURE command comes in very handy here:
RMAN> advise failure;

List of Database Failures
=========================

Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- -------
21821 HIGH OPEN 03-FEB-08 Datafile 5: '/u01/app/oracle/oradata/orcl/ORCL/datafile/o1_mf_encrypt__3sd9vvs5_.dbf' contains one or more corrupt blocks

analyzing automatic repair options; this may take some time
using channel ORA_DISK_1
analyzing automatic repair options complete

Mandatory Manual Actions
========================
no manual actions available

Optional Manual Actions
=======================
no manual actions available

Automated Repair Options
========================
Option Repair Description
------ ------------------
1 Recover multiple corrupt blocks in datafile 5
Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2984746849.hm


This command gives us comprehensive information about the best possible option for a repair and it also automatically generates a repair script which contains the commands that should be run for the repair action. This script can either be run manually or with a repair command. RMAN> host;


[oracle@rhas4 ~]$ cat /u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2984746849.hm

# block media recovery for multiple blocks
recover datafile 5 block 43 to 46;

RMAN> repair failure;

Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2984746849.hm

contents of repair script:
# block media recovery for multiple blocks
recover datafile 5 block 43 to 46;

At this point RMAN asks us to confirm the repair actoin.
It is also possible to run the REPAIR FAILURE command with NOPROMPT.

Do you really want to execute the above repair (enter YES or NO)? yes
executing repair script

Starting recover at 03-FEB-08
using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s) from datafile copy /u01/app/oracle/flash_recover/ORCL/datafile/o1_mf_encrypt__3t1w5bkp_.dbf

starting media recovery
media recovery complete, elapsed time: 00:00:07

Finished recover at 03-FEB-08
repair failure complete



The REPAIR FAILURE command can be run with PREVIEW option which does not execute the repair script but shows it.

There are a number of dynamic views for monitoring failures and the repair actions:

SYSTEM @ orcl SQL> SELECT table_name FROM dict WHERE lower(table_name) like 'v$ir%';

TABLE_NAME
------------------------------
V$IR_FAILURE
V$IR_FAILURE_SET
V$IR_MANUAL_CHECKLIST
V$IR_REPAIR

SYSTEM @ orcl SQL> desc V$IR_FAILURE
Name Null? Type
-------------- -------- -------
FAILURE_ID NUMBER
PARENT_ID NUMBER
CHILD_COUNT NUMBER
CLASS_NAME VARCHAR2(32)
TIME_DETECTED DATE
MODIFIED DATE
DESCRIPTION VARCHAR2(1024)
IMPACTS VARCHAR2(1024)
PRIORITY VARCHAR2(8)
STATUS VARCHAR2(12)

SYSTEM @ orcl SQL> desc V$IR_FAILURE_SET
Name Null? Type
----------------- -------- -------------
ADVISE_ID NUMBER
FAILURE_ID NUMBER
MANUAL_REPAIRS_ONLY VARCHAR2(3)

SYSTEM @ orcl SQL> desc V$IR_MANUAL_CHECKLIST
Name Null? Type
---------------- -------- -------------
ADVISE_ID NUMBER
RANK NUMBER
REQUIRED VARCHAR2(3)
MESSAGE VARCHAR2(1024)


SYSTEM @ orcl SQL> desc V$IR_REPAIR
Name Null? Type
--------------- -------- ---------------
REPAIR_ID NUMBER
ADVISE_ID NUMBER
SUMMARY VARCHAR2(32)
RANK NUMBER
TIME_DETECTED DATE
EXECUTED DATE
ESTIMATED_DATA_LOSS VARCHAR2(20)
DETAILED_DESCRIPTION VARCHAR2(1024)
REPAIR_SCRIPT VARCHAR2(512)
ESTIMATED_REPAIR_TIME NUMBER
ACTUAL_REPAIR_TIME NUMBER
STATUS VARCHAR2(7)


Let’s take a look at a failure from here:

SYSTEM @ orcl SQL> SELECT repair_id, detailed_description, repair_script FROM V$IR_REPAIR;

REPAIR_ID DETAILED_DESCRIPTION REPAIR_SCRIPT
---------- ----------------------------------- --------------------------------- 21882 The repair includes complete media
/u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2984746849.hm

No comments: