Thursday, August 06, 2009

How to install Oracle Grid Control Agents on a Windows failover cluster with no downtime

Metalink Note:464191.1 describes steps required to configure Oracle Grid Control agent in Windows failover cluster environment. Unfortunately, as part of the configuration, the cluster disk containing virtual agent's state information have to be moved to the node where the agent is being deployed.

The agent state directory has to "follow" virtual agent when a failover occurs, hence the requirement for it to be on a cluster disk resource. And since the cluster disk resource is visible on the active node only, you can not deploy virtual agent on any of the passive nodes without moving the group containing disk with state information first.

The above is not a big deal when you're doing install on a brand new or development system but what if you have to deal with a production cluster where any potential downtime that might be associated with moving the group across the nodes would better be avoided?

Let's say you have an Oracle FailSafe configuration and you intend to use Oracle Grid Control to monitor your Oracle database. In this case your virtual agent will be a part of the same cluster group where your Oracle database is. Failing over your database across all the nodes for the sake of deploying a virtual agent may not necessarily be what you want to do.

Of course, the easy workaround is to add another disk (LUN) to a cluster, use it to deploy the agents and, once the deployment has been done, add it to the same group where your database is. But what if you do not have any spare disks and have to share the same cluster disk with your Oracle database?

I gave this problem a bit of research and, as it turned out, there is a really simple workaround which may come in handy in case you'll be faced with the same problem.

I'll use the following configuration as an example:
ORA01A -- first (active) node.
ORA01B -- second (passive) node.
ORA01V -- Oracle Database VIP.
c:\oracle\product\10.2.0\agent10g -- Oracle Grid Control agent home.

Let's say that each system has a local drive C: and the deployment will be done on a cluster drive D:.

Deploy virtual agent on the active node

This is where you follow exactly what Metalink note says you to do:
C:\>emctl deploy agent -n OracleAgentORA01V d:\agent10g ORA01V:1830 ORA01A:1830
Creating shared install...
Source location: C:\oracle\product\10.2.0\agent10g
Destination (shared install) : d:\agent10g
DeployMode : agent

Creating directories...
Creating targets.xml...
Creating emctl control program...
Creating emtgtctl control program...
Setting log and trace files locations for Agent ...
Secure agent found. New agent should be configured for secure mode

Source Agent operating in secure mode.
Run "d:\agent10g/bin/emctl secure agent" to secure agent
Service "OracleAgentORA01V" create SUCCESS
The above will create a virtual agent service named OracleAgentORA01V which will be "bound" to ORA01V virtual IP and use d:\agent10g as a location for virtual agent's state files. Note that I'm using port 1830 since port 3872 is used by a "real" agent. You can specify AgentListenOnAllNICs=FALSE in your emd.properties file (for all agents in the cluster) if you want virtual and real agents share the same port as this will stop agents from trying to listen on all network adapters on the node.

Secure the agent in case your OMS is running in the secure mode:
C:\>d:\agent10g/bin/emctl secure agent
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
Agent is already stopped... Done.
Securing agent... Started.
Enter Agent Registration Password :
Securing agent... Successful.
Deploy virtual agent on the passive node

The same deployment command won't work on a passive node simply because drive D: is not there. As a workaround which will allow us to create a virtual agent service on the passive node we will use a local drive C: for initial deployment:
C:\>emctl deploy agent -n OracleAgentORA01V c:\agent10g ORA01V:1830 ORA01B:1830
Creating shared install...
Source location: C:\oracle\product\10.2.0\agent10g
Destination (shared install) : c:\agent10g
DeployMode : agent

Creating directories...
Creating targets.xml...
Creating emctl control program...
Creating emtgtctl control program...
Setting log and trace files locations for Agent ...
Secure agent found. New agent should be configured for secure mode

Source Agent operating in secure mode.
Run "c:\agent10g/bin/emctl secure agent" to secure agent
Service "OracleAgentORA01V" create SUCCESS
However, this is not what we want as all virtual agents should be sharing the same cluster drive D: instead. To fix the location of the agent state directory, launch regedit.exe and navigate to
HKLM\SOFTWARE\ORACLE\SYSMAN\OracleAgentORA01V
registry key. Under that key you'll find EMSTATE field with c:\agent10g as its value. Modify this value to be d:\agent10g instead. You can remove original folder as well.

Done! I found that this virtual agent will be fully operational once the group failovers to the passive node (don't forget to create a cluster resource for a virtual agent) and will be using shared state directory.

Saturday, February 07, 2009

Consistent gets from cache (fastpath) 2

Not so long time ago I wrote an article about interesting optimization in 11G which appears as consistent gets from cache (fastpath). One thing I've pointed there is that this optimization can operate only if we're accessing the same block over and over again. This may bring us to some interesting observations how certain type of queries are behaving in 11G.

HASH GROUP BY vs SORT GROUP BY

Consider the following example:
SQL> create table dept
2 (
3 dept_id number primary key,
4 dept_name varchar2(100)
5 ) organization index;

Table created

SQL> insert /*+ append */ into dept
2 select level, dbms_random.string('x', 100)
3 from dual
4 connect by level <= 10000;

10000 rows inserted

SQL> create table emp
2 (
3 emp_id number primary key,
4 dept_id references dept (dept_id),
5 emp_name varchar2(100)
6 );

Table created

SQL> insert /*+ append */ into emp
2 select level, trunc(dbms_random.value(1, 10000)), dbms_random.string('x', 100)
3 from dual
4 connect by level <= 100000;

100000 rows inserted

SQL> commit;

Commit complete

SQL> exec dbms_stats.gather_table_stats(user, 'dept');

PL/SQL procedure successfully completed

SQL> exec dbms_stats.gather_table_stats(user, 'emp');

PL/SQL procedure successfully completed
Let's say we want to output department names along with how many employees are there:
SQL> select /*+ gather_plan_statistics */ count(*)
2 from (
3 select /*+ no_merge */ d.dept_name, count(*) cnt
4 from emp e, dept d
5 where e.dept_id=d.dept_id
6 group by d.dept_name
7 );

COUNT(*)
----------
9999

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID djysxbcmwwxj3, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from ( select /*+
no_merge */ d.dept_name, count(*) cnt from emp e, dept d where
e.dept_id=d.dept_id group by d.dept_name )

Plan hash value: 1432452646

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.08 | 11636 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.08 | 11636 | | | |
| 2 | VIEW | | 1 | 9999 | 9999 |00:00:00.06 | 11636 | | | |
| 3 | HASH GROUP BY | | 1 | 9999 | 9999 |00:00:00.06 | 11636 | 2058K| 999K| 2497K (0)|
| 4 | NESTED LOOPS | | 1 | 9999 | 9999 |00:00:00.02 | 11636 | | | |
| 5 | VIEW | VW_GBC_10 | 1 | 9999 | 9999 |00:00:00.02 | 1635 | | | |
| 6 | HASH GROUP BY | | 1 | 9999 | 9999 |00:00:00.02 | 1635 | 1207K| 1207K| 2496K (0)|
| 7 | TABLE ACCESS FULL| EMP | 1 | 100K| 100K|00:00:00.01 | 1635 | | | |
|* 8 | INDEX UNIQUE SCAN | SYS_IOT_TOP_15648 | 9999 | 1 | 9999 |00:00:00.01 | 10001 | | | |
----------------------------------------------------------------------------------------------------------------------------------


PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("ITEM_1"="D"."DEPT_ID")


27 rows selected.
Note that 11G did "Group by Placement" automatically (Jonathan Lewis wrote an article about it) and our query performed 11636 consistent gets.

Now, take a look at what happens if we rewrite the query to use a sort group by instead:
SQL> select /*+ gather_plan_statistics */ count(*)
2 from (
3 with e as
4 (
5 select dept_id, count(*) cnt
6 from emp e
7 group by dept_id
8 order by dept_id
9 )
10 select /*+ no_merge */ d.dept_name, e.cnt
11 from e, dept d
12 where e.dept_id=d.dept_id
13 );

COUNT(*)
----------
9999

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
SQL_ID 2utq5vammnwa4, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from ( with e as (
select dept_id, count(*) cnt from emp e group by dept_id order
by dept_id ) select /*+ no_merge */ d.dept_name, e.cnt from e, dept
d where e.dept_id=d.dept_id )

Plan hash value: 2732217545

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.10 | 2624 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.10 | 2624 | | | |
| 2 | VIEW | | 1 | 9999 | 9999 |00:00:00.08 | 2624 | | | |
| 3 | NESTED LOOPS | | 1 | 9999 | 9999 |00:00:00.08 | 2624 | | | |
| 4 | VIEW | | 1 | 9999 | 9999 |00:00:00.08 | 1635 | | | |
| 5 | SORT GROUP BY | | 1 | 9999 | 9999 |00:00:00.08 | 1635 | 549K| 549K| 487K (0)|
| 6 | TABLE ACCESS FULL| EMP | 1 | 100K| 100K|00:00:00.01 | 1635 | | | |
|* 7 | INDEX UNIQUE SCAN | SYS_IOT_TOP_15648 | 9999 | 1 | 9999 |00:00:00.01 | 989 | | | |
---------------------------------------------------------------------------------------------------------------------------------


PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

7 - access("E"."DEPT_ID"="D"."DEPT_ID")


27 rows selected.
What I did there is performed the same transformation myself but replaced hash group by with a sort group by. Note more than 4 times decrease in the amount of consistent gets. By looking at the plan statistics, you can tell why. Although both group by's performed the same amount of consistent gets (1635), it is a nested loops join with dep which produced all the difference -- 10001 for hash group by versus only 989 for a sort group by.

Unless you've heard about consistent gets from cache (fastpath) optimization, the above results may produce quite a bit of surprise for you. Note that due to a sorting, which has to be performed by our second query, the first query still performs better. However, in the environments which are wreaking havoc on CBC latches, the additional savings on number of consistent gets may have a potential to alleviate additional sorting expenses. In other words, the lesser are expenses for outer resultset sorting, the more appealing this could be.

Of course, before you even consider this as an optimization opportunity, keep in mind that you're relying on a specific feature which may narrow you down to a specific dot releases (or even patches) as it may change (or even completely disappear) in the next versions.

Sunday, February 01, 2009

AE enqueue

Just a quick follow up from my previous post.

The relation of AE enqueue to editions was mentioned a couple of times around the internet already. Any user session connected to a database holds AE enqueue in a shared mode...
SQL> select type, id1, lmode, sys_context('userenv', 'current_edition_id') edition_id
2 from v$lock
3 where type='AE'
4 and sid=sys_context('userenv', 'sid');

TYPE ID1 LMODE EDITION_ID
---- ---------- ---------- --------------------------------------------------------------------------------
AE 100 4 100
...and the first argument seems to be session's current_edition_id. I guess the lock mode will require an upgrade to exclusive mode during edition alterations.

Here is another interesting thing -- it looks like installing 11.1.0.7 patchset increments the current_edition_id. The value in 11.1.0.6 seems to be 99, but 11.1.0.7 changes it to 100 (you can observe it in sys.editon$ table as well). Does that mean that Oracle has any plans in doing patchset installation through edition-based redefinition (install the patchset online, short downtime is required only during switch to an upgraded edition) or is it simply a way to represent version change?

Updated the same day: it looks like in case your database was upgraded from a previous release, edition_id for ORA$BASE will be some other number as it represents ORA$BASE's object_id. This also means that my initial assumption about patchset installation changing edition_id is not correct as it is just whatever object_id is being available at the time. Before ORA$BASE edition is being created, 11.1.0.7 creates one more object (compared to 11.1.0.6), index I_SYN2, which explains advance in edition_id.

However, the idea seems to be interesting anyway...

Saturday, January 31, 2009

Edition-based redifinition (a speculation)

As you can see from this link, there is a feature called Edition-based redefinition mentioned in the 11GR1 documentation. Unfortunately, all it says is that this feature is unavailable as of now so there is really not much you can tell about it.

A speculation

I'm usually not a great fun of doing any sort of a guesswork, however, in this case it might be an interesting exercise to observe some bits and pieces presented in the current release of 11GR1 in order to see whether it can give us some clues as to what expect from this new feature as well as what the potential underpinning might look like. As well as prepare yourself for some implications...

SYS.OBJ$

Perhaps this will be the first thing what you'll notice as soon as 11G's data dictionary is concerned. For example, this is how dba_synonyms view is defined in 10GR2:
create or replace view dba_synonyms
(owner, synonym_name, table_owner, table_name, db_link)
as
select u.name, o.name, s.owner, s.name, s.node
from sys.user$ u, sys.syn$ s, sys.obj$ o
where o.obj# = s.obj#
and o.type# = 5
and o.owner# = u.user#;
Now, take a look at the same view's definition in 11GR1:
create or replace view dba_synonyms
(owner, synonym_name, table_owner, table_name, db_link)
as
select u.name, o.name, s.owner, s.name, s.node
from sys.user$ u, sys.syn$ s, sys."_CURRENT_EDITION_OBJ" o
where o.obj# = s.obj#
and o.type# = 5
and o.owner# = u.user#;
For you see, the reference to SYS.OBJ$ was replaced with SYS."_CURRENT_EDITION_OBJ". This replacement occurs all around the place in the data dictionary which alone makes it interesting enough to attract a bit of attention.

SYS._CURRENT_EDITION_OBJ

What is _CURRENT_EDITION_OBJ? It's a view:
SQL> select object_type
2 from dba_objects
3 where object_name='_CURRENT_EDITION_OBJ';

OBJECT_TYPE
-------------------
VIEW
Let's take a look at this view's definition (I've omitted fields list for the sake of clarity):
select ...
from obj$ o, user$ u
where o.owner# = u.user#
and ( /* non-versionable object */
( o.type# not in (4,5,7,8,9,10,11,12,13,14,22,87,88)
or bitand(u.spare1, 16) = 0)
/* versionable object visible in current edition */
or ( o.type# in (4,5,7,8,9,10,11,12,13,14,22,87)
and ( (u.type# <> 2 and
sys_context('userenv', 'current_edition_name') = 'ORA$BASE')
or (u.type# = 2 and
u.spare2 = sys_context('userenv', 'current_edition_id'))
or exists (select 1 from obj$ o2, user$ u2
where o2.type# = 88
and o2.dataobj# = o.obj#
and o2.owner# = u2.user#
and u2.type# = 2
and u2.spare2 =
sys_context('userenv', 'current_edition_id'))
)
)
);
First of all, it mentions about (A) versionable objects and (B) current edition which is referenced through session's context sys_context('userenv', 'current_edition_id').

Versionable objects

Only the objects of the following types seems to be versionable:
  • VIEW
  • SYNONYM
  • PROCEDURE
  • FUNCTION
  • PACKAGE
  • PACKAGE BODY
  • TRIGGER
  • TYPE
  • TYPE BODY
  • LIBRARY
  • ASSEMBLY
Object type# = 88 seems to be something special as I couldn't find what it is from sql.bsq (nor dba_objects) and it is being referenced using somewhat special way: if obj# matches dataobj# of some entry with type# = 88 and this entry is part if the current edition, the object is considered to be a part of the current edition as well. Note that the join itself permits this special object to exist in some other schema, though I don't know whether this has any practical meaning or not.

New userenv attributes

There are two (at least) new userenv context attributes:
SQL> select sys_context('userenv', 'current_edition_name')
2 current_edition_name from dual;

CURRENT_EDITION_NAME
--------------------------------------------------------------------------------
ORA$BASE
SQL> select sys_context('userenv', 'current_edition_id')
2 current_edition_id from dual;

CURRENT_EDITION_ID
--------------------------------------------------------------------------------
100
Changing these will change data returned by _CURRENT_EDITION_OBJ view as well and, given that that view is referenced instead of obj$, it looks like these contexts will be the primary mechanism responsible for referencing one version (edition) of the object or the other.

New user type

Again, by looking at the view definition, we may spot that versionable objects are being concerned as far as user$.type# = 2. To remind you, 0 is a role and 1 is a regular user, so there will be something new, perhaps this new feature could be enabled on a per-user level and/or versioned objects will be owned by some other "ghost" schema (my comment about object with type 88 seems to be allowing this, at least technically).

Some interesting clues might be get from _CURRENT_EDITION_OBJ's field definition itself:
when (u.type# = 2) then
(select eo.name from obj$ eo where eo.obj# = u.spare2)
else
'ORA$BASE'
end
The objects which are owned by this new user type will have editition_id as their object_id and will be named after edition name. That also means that object_id is no longer unique:
create unique index i_obj1 on obj$(obj#, owner#, type#) (11GR1)
create unique index i_obj1 on obj$(obj#) (10GR2)
(from sql.bsq)
I wander whether this have some chances to break legacy code which references sys.obj$ by obj# and expect one row to be returned at most...

Does all that, again, points out that object versions will be kept under this special user and will be linked back using data_object_id? Is this also a reason why objects which do have "proper" data_object_id are not versionable?

A speculation (again)

Pleases note that everything said in this post is based on some (random) observations of preliminary data available in the current release (11.1.0.7). There are chances that some (if not all) of this data may became altered or even completely invalid when this feature will be officially announced.

However, one thing seems to be clear -- in order to support this new feature, some serious alterations of the data dictionary are taking place and you should keep yourself alerted...

Create database or who wants some DMT?

Simpler than ever

Starting from Oracle 10G, creating the database can be as simple as this:
SQL> create database;

Database created.
It's not a surprise that this feature was somewhat advertised here and there. However, what was missing in these advertisements is this:
SQL> select name, decode(bitmapped, 0, 'DMT', 'LMT')
from ts$
order by name; 2 3

NAME DEC
------------------------------ ---
SYSAUX LMT
SYSTEM DMT
SYS_UNDOTS LMT
For you see, create database will make your SYSTEM tablespace to be dictionary managed by default. I don't really know if there are any reasons for this and since everything else will default to LMT, this should not be a big deal for most of you anyway. Just don't forget that SYSTEM hosts objects like AUD$ (audit log), FGA_LOG$ (fine-grained audit log) or NCOMP_DLL$ (natively compiled objects) which could grow to a fairly large number of extents.

I was a bit surprised watching this relic appear even when you do this in 11.1.0.7...

Friday, January 30, 2009

Moving a datafile

Sometimes you need to move a datafile into a different mount point or ASM diskgroup. This could make you wandering what technique you can use in order to minimize downtime. I'll show you one of my favorite methods which works well under certain circumstances.

Let's say you want to move the following datafile...
SQL> select file_id, file_name
2 from dba_data_files
3 where tablespace_name='USERS';

FILE_ID FILE_NAME
------- --------------------------------------------------------
4 /u01/oradata/ORA11GR1/datafile/o1_mf_users_4q759m64_.dbf
...into mount point /u02.

Backup as copy

The first thing we need to do is backup this datafile as copy using RMAN:
[oracle@ora11gr1a ~]$ rman

Recovery Manager: Release 11.1.0.7.0 - Production on Thu Jan 29 17:29:59 2009

Copyright (c) 1982, 2007, Oracle. All rights reserved.

RMAN> connect target;

connected to target database: ORA11GR1 (DBID=3707369966)

RMAN> backup as copy datafile 4
2> format '/u02/oradata/ORA11GR1/datafile/users01.dbf';

Starting backup at 29-JAN-09
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile copy
input datafile file number=00004 name=/u01/oradata/ORA11GR1/datafile/o1_mf_users
_4q759m64_.dbf
output file name=/u02/oradata/ORA11GR1/datafile/users01.dbf tag=TAG20090129T1825
32 RECID=14 STAMP=677442334
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 29-JAN-09
Rollforward image copy

Since switching to datafile copy will require datafile recover, it might be a good idea to rollforward this image copy first, in order to bring it up to date...
RMAN> list copy of datafile 4;

List of Datafile Copies
=======================

Key File S Completion Time Ckp SCN Ckp Time
------- ---- - --------------- ---------- ---------------
14 4 A 29-JAN-09 2033489 29-JAN-09
Name: /u02/oradata/ORA11GR1/datafile/users01.dbf
Tag: TAG20090129T182532

RMAN> backup incremental from scn 2033489 datafile 4 format '/u02/oradata/ORA11G
R1/datafile/%U';

Starting backup at 29-JAN-09

using channel ORA_DISK_1
backup will be obsolete on date 05-FEB-09
archived logs will not be kept or backed up
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00004 name=/u01/oradata/ORA11GR1/datafile/o1_mf_users
_4q759m64_.dbf
channel ORA_DISK_1: starting piece 1 at 29-JAN-09
channel ORA_DISK_1: finished piece 1 at 29-JAN-09
piece handle=/u02/oradata/ORA11GR1/datafile/1nk61t1d_1_1 tag=TAG20090129T183004
comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01

using channel ORA_DISK_1
backup will be obsolete on date 05-FEB-09
archived logs will not be kept or backed up
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
channel ORA_DISK_1: starting piece 1 at 29-JAN-09
channel ORA_DISK_1: finished piece 1 at 29-JAN-09
piece handle=/u02/oradata/ORA11GR1/datafile/1ok61t1e_1_1 tag=TAG20090129T183004
comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 29-JAN-09

RMAN> recover copy of datafile 4;

Starting recover at 29-JAN-09
using channel ORA_DISK_1
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile copies to recover
recovering datafile copy file number=00004 name=/u02/oradata/ORA11GR1/datafile/u
sers01.dbf
channel ORA_DISK_1: reading from backup piece /u02/oradata/ORA11GR1/datafile/1nk
61t1d_1_1
channel ORA_DISK_1: piece handle=/u02/oradata/ORA11GR1/datafile/1nk61t1d_1_1 tag
=TAG20090129T183004
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished recover at 29-JAN-09

RMAN> list copy of datafile 4;

List of Datafile Copies
=======================

Key File S Completion Time Ckp SCN Ckp Time
------- ---- - --------------- ---------- ---------------
15 4 A 29-JAN-09 2033633 29-JAN-09
Name: /u02/oradata/ORA11GR1/datafile/users01.dbf
Tag: TAG20090129T182532
Note that image copy's checkpoint SCN has moved forward. Keep in mind that this step generally makes sense only if you have block change tracking enabled and/or there is a huge amount of archivelogs to apply, as it will be a trade-off between creating and applying the incremental backup compared to directly applying all necessarily archivelogs. This step can be done starting from Oracle 10G.

Switch datafile

All we have to do now is execute small RMAN block...
RMAN> run
2> {
3> sql 'alter database datafile 4 offline';
4> switch datafile 4 to datafilecopy '/u02/oradata/ORA11GR1/datafile/users0
1.dbf';
5> recover datafile 4;
6> sql 'alter database datafile 4 online';
7> }

sql statement: alter database datafile 4 offline

datafile 4 switched to datafile copy
input datafile copy RECID=15 STAMP=677442659 file name=/u02/oradata/ORA11GR1/dat
afile/users01.dbf

Starting recover at 29-JAN-09
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished recover at 29-JAN-09

sql statement: alter database datafile 4 online
This is where you'll have some downtime. The amount of downtime depends on how long it will take to recover the datafile which will generally be a function of how many archivelogs needs to be applied which, in turn, can be reduced by using incremental backup. The point is that this step can be really fast.

Don't forget to watch against nologging operations!

Sunday, January 25, 2009

Update and rownum oddity

Take a look at the following table:
SQL> create table codes
2 (
3 code varchar2(10),
4 used number,
5 constraint pk_codes primary key (used, code)
6 ) organization index;

Table created.

SSQL> insert into codes
2 select dbms_random.string('x', 10),
3 case when level <= 5000 then 1 else 0 end
4 from dual
5 connect by level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user, 'codes');

PL/SQL procedure successfully completed.
This table contains a set of codes with used column representing whether the code was already used (1) or not (0). We need to return one (random) unused code from the above table and mark this code as used. This is very easy to archive using the following update statement:
SQL> variable code varchar2(10);
SQL> set autot traceonly
SQL> update codes set used=1
2 where used=0 and rownum=1
3 returning code into :code;

1 row updated.


Execution Plan
----------------------------------------------------------
Plan hash value: 1169687698

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | 13 | 117 (1)| 00:00:02 |
| 1 | UPDATE | CODES | | | | |
|* 2 | COUNT STOPKEY | | | | | |
|* 3 | INDEX FAST FULL SCAN| PK_CODES | 50000 | 634K| 117 (1)| 00:00:02 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(ROWNUM=1)
3 - filter("USED"=0)


Statistics
----------------------------------------------------------
1 recursive calls
5 db block gets
12 consistent gets
0 physical reads
124 redo size
913 bytes sent via SQL*Net to client
874 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
However, the plan is not exactly what I would expect (I'm running this on 11.1.0.7)... Why do IFFS when we can do IRS to get only one row? This is exactly what regular select does, after all:
SQL> select *
2 from codes
3 where used=0 and rownum=1;


Execution Plan
----------------------------------------------------------
Plan hash value: 802332609

------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
|* 2 | INDEX RANGE SCAN| PK_CODES | 1 | 13 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM=1)
2 - access("USED"=0)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
590 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

10053 trace

Let's take a look at 10053 trace output for both select and update statements. I'm picking up relevant sections.

UPDATE:
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: CODES Alias: CODES
#Rows: 100000 #Blks: 423 AvgRowLen: 13.00
Index Stats::
Index: PK_CODES Col#: 2 1
LVLS: 1 #LB: 423 #DK: 100000 LB/K: 1.00 DB/K: 1.00 CLUF: 0.00
Access path analysis for CODES
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for CODES[CODES]
Table: CODES Alias: CODES
Card: Original: 100000.000000 Rounded: 50000 Computed: 50000.00 Non Adjusted: 50000.00
Access Path: index (index (FFS))
Index: PK_CODES
resc_io: 116.00 resc_cpu: 20012369
ix_sel: 0.000000 ix_sel_with_filters: 1.000000
Access Path: index (FFS)
Cost: 116.81 Resp: 116.81 Degree: 1
Cost_io: 116.00 Cost_cpu: 20012369
Resp_io: 116.00 Resp_cpu: 20012369
OPTIMIZER PERCENT INDEX CACHING = 0


Access Path: index (IndexOnly)
Index: PK_CODES
resc_io: 213.00 resc_cpu: 11516867
ix_sel: 0.500000 ix_sel_with_filters: 0.500000
Cost: 213.47 Resp: 213.47 Degree: 1
Best:: AccessPath: IndexFFS
Index: PK_CODES
Cost: 116.81 Degree: 1 Resp: 116.81 Card: 50000.00 Bytes: 0

***************************************
Nothing else is being tried and this is what optimizer selects as the best execution plan. Note how cardinalities are being reported (highlighted in red).

SELECT:

In addition to the above, has one more section:
***************************************
SINGLE TABLE ACCESS PATH (First K Rows)
Single Table Cardinality Estimation for CODES[CODES]
Table: CODES Alias: CODES
Card: Original: 2.000000 Rounded: 1 Computed: 1.00 Non Adjusted: 1.00
Access Path: index (index (FFS))
Index: PK_CODES
resc_io: 2.00 resc_cpu: 7461
ix_sel: 0.000000 ix_sel_with_filters: 1.000000
Access Path: index (FFS)
Cost: 2.00 Resp: 2.00 Degree: 1
Cost_io: 2.00 Cost_cpu: 7461
Resp_io: 2.00 Resp_cpu: 7461
OPTIMIZER PERCENT INDEX CACHING = 0


Access Path: index (IndexOnly)
Index: PK_CODES
resc_io: 2.00 resc_cpu: 14443
ix_sel: 0.500000 ix_sel_with_filters: 0.500000
Cost: 2.00 Resp: 2.00 Degree: 1
Best:: AccessPath: IndexRange
Index: PK_CODES
Cost: 2.00 Degree: 1 Resp: 2.00 Card: 1.00 Bytes: 13

First K Rows: unchanged join prefix len = 1
Join order[1]: CODES[CODES]#0
***********************
Note how cardinalities has changed (highlighted in green) this time. What happened is rownum = 1 predicate resulted in fist_rows(1) mode (highlighted in blue), affecting how cardinalities were calculated.

We know that first_rows(n) hint is being ignored in update and delete statements, thus our update statement always goes in all_rows mode.

You can confirm that select behaves exactly the same way when in all_rows mode:
SQL> select /*+ all_rows */ *
2 from codes
3 where used=0 and rownum=1;


Execution Plan
----------------------------------------------------------
Plan hash value: 2682988822

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 117 (1)| 00:00:02 |
|* 1 | COUNT STOPKEY | | | | | |
|* 2 | INDEX FAST FULL SCAN| PK_CODES | 50000 | 634K| 117 (1)| 00:00:02 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM=1)
2 - filter("USED"=0)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12 consistent gets
0 physical reads
0 redo size
590 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Well, I guess here goes my next wish for CBO improvement regarding how update and delete statements are handled with predicates involving rownum...

Friday, January 23, 2009

11G Managed Recovery Process

MRP process is commonly referenced throughout the web as the process which performs redo apply to your managed standby database. MRP may can in a team with PQ slave (or prnn in 11G) processes in case you start managed recovery in parallel.

Unfortunately, the term performs redo apply seems to be causing some confusion along the way as well. I found it very common that people believes that it's MRP process which performs both reads from the redo streams as well as writes changes into datafiles. For example, sometimes they try to battle slow log apply by increasing managed recovery parallelism without realizing that there is in fact a bit more to the puzzle.

Some details

I'm going to use my 11G DataGuard setup to demonstrate a couple of key points. My setup is operating using real time apply (no parallel), which makes the entire example a bit simpler to demonstrate.

Let's update a row on the source DB:
SQL> update t set n=n;

1 row updated.

SQL> commit;

Commit complete.
Now, take a look at MRP strace output which was produced as a result of the above change:
[oracle@ora11gr1b fd]$ ps -fp 6364
UID PID PPID C STIME TTY TIME CMD
oracle 6364 1 0 19:59 ? 00:00:00 ora_mrp0_ora11gr1
[oracle@ora11gr1b fd]$ strace -e pread,pwrite -p 6364
Process 6364 attached - interrupt to quit
pread(32,..., 512, 45568) = 512
pread(32,..., 1024, 46080) = 1024
pread(30,..., 8192, 2228224) = 8192
pread(30,..., 8192, 259858432) = 8192
pread(31,..., 8192, 9461760) = 8192
I've set filesystemio_options=none so we can observe pread/pwrite syscalls which are easier to follow compared to asynch io_submit/io_getvents system calls (and we don't care about O_DIRECT flag either).

Let's check what are these file descriptors:
[oracle@ora11gr1b fd]$ cd /proc/6364/fd
[oracle@ora11gr1b fd]$ file 30
30: symbolic link to `/u01/oradata/ORA11GR1B/datafile/o1_mf_undotbs1_0fk5fp2c_.dbf'
[oracle@ora11gr1b fd]$ file 31
31: symbolic link to `/u01/oradata/ORA11GR1B/datafile/o1_mf_users_0ik5fp4u_.dbf'
[oracle@ora11gr1b fd]$ file 32
32: symbolic link to `/u01/oradata/ORA11GR1B/onlinelog/o1_mf_9_4qn2rkhk_.log'
In other words, the process read from standby logfile, undo and users (this is where our table is) tablespaces. However, as you might notice, all these calls were reads, we didn't write anything.

From time to time MRP gets a bit more interesting, for example during logfile switches:
...
pwrite(27,..., 16384, 16384) = 16384
pread(27,..., 16384, 16384) = 16384
pread(28,..., 8192, 8192) = 8192
pread(29,..., 8192, 8192) = 8192
pread(30,..., 8192, 8192) = 8192
pread(31,..., 8192, 8192) = 8192
pread(27,..., 16384, 393216) = 16384
pwrite(28,..., 8192, 8192) = 8192
pwrite(29,..., 8192, 8192) = 8192
pwrite(30,..., 8192, 8192) = 8192
pwrite(31,..., 8192, 8192) = 8192
...
Here we actually wrote (27 is a controlfile, 28 and 29 are system and sysaux tablespaces respectively) something. However, from the offset (fourth parameter) you can realize that we are writing to the second block in these datafiles. There is no (can't be) any user data there.

Who is writing the data then?

The first thing you might want to check is, of course, the database writer process:
[oracle@ora11gr1b ~]$ ps -fp 6303
UID PID PPID C STIME TTY TIME CMD
oracle 6303 1 0 19:55 ? 00:00:00 ora_dbw0_ora11gr1
[oracle@ora11gr1b ~]$ strace -e pread,pwrite -p 6303
Process 6303 attached - interrupt to quit
pwrite(23,..., 8192, 2097152) = 8192
pwrite(23,..., 8192, 35987456) = 8192
pwrite(24,..., 8192, 9461760) = 8192
This is the output produced by standby's dbwr right after we updated our table on the source. We wrote two undo blocks (23) and one block in users tablespace (24). By looking at the offset for file descriptor 24 we can confirm that we wrote the table itself:
SQL> select segment_name
2 from dba_extents
3 where tablespace_name='USERS'
4 and 9461760/8192 between block_id and block_id + blocks-1;

SEGMENT_NAME
--------------------------
T
From the above you can confirm that it is DBWR process which wrote the changes for us and it plays crucial role during your standby database operations.

MRP's workload consists mostly from reading the redo stream, datafiles, controlfiles and occasional writes into the controlfile and datafiles header.

If your standby is suffering from the redo apply performance, you may want to pay attention to both MRP and DBWR processes.