Thursday, August 06, 2009

How to install Oracle Grid Control Agents on a Windows failover cluster with no downtime

Metalink Note:464191.1 describes steps required to configure Oracle Grid Control agent in Windows failover cluster environment. Unfortunately, as part of the configuration, the cluster disk containing virtual agent's state information have to be moved to the node where the agent is being deployed.

The agent state directory has to "follow" virtual agent when a failover occurs, hence the requirement for it to be on a cluster disk resource. And since the cluster disk resource is visible on the active node only, you can not deploy virtual agent on any of the passive nodes without moving the group containing disk with state information first.

The above is not a big deal when you're doing install on a brand new or development system but what if you have to deal with a production cluster where any potential downtime that might be associated with moving the group across the nodes would better be avoided?

Let's say you have an Oracle FailSafe configuration and you intend to use Oracle Grid Control to monitor your Oracle database. In this case your virtual agent will be a part of the same cluster group where your Oracle database is. Failing over your database across all the nodes for the sake of deploying a virtual agent may not necessarily be what you want to do.

Of course, the easy workaround is to add another disk (LUN) to a cluster, use it to deploy the agents and, once the deployment has been done, add it to the same group where your database is. But what if you do not have any spare disks and have to share the same cluster disk with your Oracle database?

I gave this problem a bit of research and, as it turned out, there is a really simple workaround which may come in handy in case you'll be faced with the same problem.

I'll use the following configuration as an example:
ORA01A -- first (active) node.
ORA01B -- second (passive) node.
ORA01V -- Oracle Database VIP.
c:\oracle\product\10.2.0\agent10g -- Oracle Grid Control agent home.

Let's say that each system has a local drive C: and the deployment will be done on a cluster drive D:.

Deploy virtual agent on the active node

This is where you follow exactly what Metalink note says you to do:
C:\>emctl deploy agent -n OracleAgentORA01V d:\agent10g ORA01V:1830 ORA01A:1830
Creating shared install...
Source location: C:\oracle\product\10.2.0\agent10g
Destination (shared install) : d:\agent10g
DeployMode : agent

Creating directories...
Creating targets.xml...
Creating emctl control program...
Creating emtgtctl control program...
Setting log and trace files locations for Agent ...
Secure agent found. New agent should be configured for secure mode

Source Agent operating in secure mode.
Run "d:\agent10g/bin/emctl secure agent" to secure agent
Service "OracleAgentORA01V" create SUCCESS
The above will create a virtual agent service named OracleAgentORA01V which will be "bound" to ORA01V virtual IP and use d:\agent10g as a location for virtual agent's state files. Note that I'm using port 1830 since port 3872 is used by a "real" agent. You can specify AgentListenOnAllNICs=FALSE in your emd.properties file (for all agents in the cluster) if you want virtual and real agents share the same port as this will stop agents from trying to listen on all network adapters on the node.

Secure the agent in case your OMS is running in the secure mode:
C:\>d:\agent10g/bin/emctl secure agent
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
Agent is already stopped... Done.
Securing agent... Started.
Enter Agent Registration Password :
Securing agent... Successful.
Deploy virtual agent on the passive node

The same deployment command won't work on a passive node simply because drive D: is not there. As a workaround which will allow us to create a virtual agent service on the passive node we will use a local drive C: for initial deployment:
C:\>emctl deploy agent -n OracleAgentORA01V c:\agent10g ORA01V:1830 ORA01B:1830
Creating shared install...
Source location: C:\oracle\product\10.2.0\agent10g
Destination (shared install) : c:\agent10g
DeployMode : agent

Creating directories...
Creating targets.xml...
Creating emctl control program...
Creating emtgtctl control program...
Setting log and trace files locations for Agent ...
Secure agent found. New agent should be configured for secure mode

Source Agent operating in secure mode.
Run "c:\agent10g/bin/emctl secure agent" to secure agent
Service "OracleAgentORA01V" create SUCCESS
However, this is not what we want as all virtual agents should be sharing the same cluster drive D: instead. To fix the location of the agent state directory, launch regedit.exe and navigate to
HKLM\SOFTWARE\ORACLE\SYSMAN\OracleAgentORA01V
registry key. Under that key you'll find EMSTATE field with c:\agent10g as its value. Modify this value to be d:\agent10g instead. You can remove original folder as well.

Done! I found that this virtual agent will be fully operational once the group failovers to the passive node (don't forget to create a cluster resource for a virtual agent) and will be using shared state directory.

Saturday, February 07, 2009

Consistent gets from cache (fastpath) 2

Not so long time ago I wrote an article about interesting optimization in 11G which appears as consistent gets from cache (fastpath). One thing I've pointed there is that this optimization can operate only if we're accessing the same block over and over again. This may bring us to some interesting observations how certain type of queries are behaving in 11G.

HASH GROUP BY vs SORT GROUP BY

Consider the following example:
SQL> create table dept
2 (
3 dept_id number primary key,
4 dept_name varchar2(100)
5 ) organization index;

Table created

SQL> insert /*+ append */ into dept
2 select level, dbms_random.string('x', 100)
3 from dual
4 connect by level <= 10000;

10000 rows inserted

SQL> create table emp
2 (
3 emp_id number primary key,
4 dept_id references dept (dept_id),
5 emp_name varchar2(100)
6 );

Table created

SQL> insert /*+ append */ into emp
2 select level, trunc(dbms_random.value(1, 10000)), dbms_random.string('x', 100)
3 from dual
4 connect by level <= 100000;

100000 rows inserted

SQL> commit;

Commit complete

SQL> exec dbms_stats.gather_table_stats(user, 'dept');

PL/SQL procedure successfully completed

SQL> exec dbms_stats.gather_table_stats(user, 'emp');

PL/SQL procedure successfully completed
Let's say we want to output department names along with how many employees are there:
SQL> select /*+ gather_plan_statistics */ count(*)
2 from (
3 select /*+ no_merge */ d.dept_name, count(*) cnt
4 from emp e, dept d
5 where e.dept_id=d.dept_id
6 group by d.dept_name
7 );

COUNT(*)
----------
9999

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID djysxbcmwwxj3, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from ( select /*+
no_merge */ d.dept_name, count(*) cnt from emp e, dept d where
e.dept_id=d.dept_id group by d.dept_name )

Plan hash value: 1432452646

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.08 | 11636 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.08 | 11636 | | | |
| 2 | VIEW | | 1 | 9999 | 9999 |00:00:00.06 | 11636 | | | |
| 3 | HASH GROUP BY | | 1 | 9999 | 9999 |00:00:00.06 | 11636 | 2058K| 999K| 2497K (0)|
| 4 | NESTED LOOPS | | 1 | 9999 | 9999 |00:00:00.02 | 11636 | | | |
| 5 | VIEW | VW_GBC_10 | 1 | 9999 | 9999 |00:00:00.02 | 1635 | | | |
| 6 | HASH GROUP BY | | 1 | 9999 | 9999 |00:00:00.02 | 1635 | 1207K| 1207K| 2496K (0)|
| 7 | TABLE ACCESS FULL| EMP | 1 | 100K| 100K|00:00:00.01 | 1635 | | | |
|* 8 | INDEX UNIQUE SCAN | SYS_IOT_TOP_15648 | 9999 | 1 | 9999 |00:00:00.01 | 10001 | | | |
----------------------------------------------------------------------------------------------------------------------------------


PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("ITEM_1"="D"."DEPT_ID")


27 rows selected.
Note that 11G did "Group by Placement" automatically (Jonathan Lewis wrote an article about it) and our query performed 11636 consistent gets.

Now, take a look at what happens if we rewrite the query to use a sort group by instead:
SQL> select /*+ gather_plan_statistics */ count(*)
2 from (
3 with e as
4 (
5 select dept_id, count(*) cnt
6 from emp e
7 group by dept_id
8 order by dept_id
9 )
10 select /*+ no_merge */ d.dept_name, e.cnt
11 from e, dept d
12 where e.dept_id=d.dept_id
13 );

COUNT(*)
----------
9999

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
SQL_ID 2utq5vammnwa4, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from ( with e as (
select dept_id, count(*) cnt from emp e group by dept_id order
by dept_id ) select /*+ no_merge */ d.dept_name, e.cnt from e, dept
d where e.dept_id=d.dept_id )

Plan hash value: 2732217545

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.10 | 2624 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.10 | 2624 | | | |
| 2 | VIEW | | 1 | 9999 | 9999 |00:00:00.08 | 2624 | | | |
| 3 | NESTED LOOPS | | 1 | 9999 | 9999 |00:00:00.08 | 2624 | | | |
| 4 | VIEW | | 1 | 9999 | 9999 |00:00:00.08 | 1635 | | | |
| 5 | SORT GROUP BY | | 1 | 9999 | 9999 |00:00:00.08 | 1635 | 549K| 549K| 487K (0)|
| 6 | TABLE ACCESS FULL| EMP | 1 | 100K| 100K|00:00:00.01 | 1635 | | | |
|* 7 | INDEX UNIQUE SCAN | SYS_IOT_TOP_15648 | 9999 | 1 | 9999 |00:00:00.01 | 989 | | | |
---------------------------------------------------------------------------------------------------------------------------------


PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

7 - access("E"."DEPT_ID"="D"."DEPT_ID")


27 rows selected.
What I did there is performed the same transformation myself but replaced hash group by with a sort group by. Note more than 4 times decrease in the amount of consistent gets. By looking at the plan statistics, you can tell why. Although both group by's performed the same amount of consistent gets (1635), it is a nested loops join with dep which produced all the difference -- 10001 for hash group by versus only 989 for a sort group by.

Unless you've heard about consistent gets from cache (fastpath) optimization, the above results may produce quite a bit of surprise for you. Note that due to a sorting, which has to be performed by our second query, the first query still performs better. However, in the environments which are wreaking havoc on CBC latches, the additional savings on number of consistent gets may have a potential to alleviate additional sorting expenses. In other words, the lesser are expenses for outer resultset sorting, the more appealing this could be.

Of course, before you even consider this as an optimization opportunity, keep in mind that you're relying on a specific feature which may narrow you down to a specific dot releases (or even patches) as it may change (or even completely disappear) in the next versions.

Sunday, February 01, 2009

AE enqueue

Just a quick follow up from my previous post.

The relation of AE enqueue to editions was mentioned a couple of times around the internet already. Any user session connected to a database holds AE enqueue in a shared mode...
SQL> select type, id1, lmode, sys_context('userenv', 'current_edition_id') edition_id
2 from v$lock
3 where type='AE'
4 and sid=sys_context('userenv', 'sid');

TYPE ID1 LMODE EDITION_ID
---- ---------- ---------- --------------------------------------------------------------------------------
AE 100 4 100
...and the first argument seems to be session's current_edition_id. I guess the lock mode will require an upgrade to exclusive mode during edition alterations.

Here is another interesting thing -- it looks like installing 11.1.0.7 patchset increments the current_edition_id. The value in 11.1.0.6 seems to be 99, but 11.1.0.7 changes it to 100 (you can observe it in sys.editon$ table as well). Does that mean that Oracle has any plans in doing patchset installation through edition-based redefinition (install the patchset online, short downtime is required only during switch to an upgraded edition) or is it simply a way to represent version change?

Updated the same day: it looks like in case your database was upgraded from a previous release, edition_id for ORA$BASE will be some other number as it represents ORA$BASE's object_id. This also means that my initial assumption about patchset installation changing edition_id is not correct as it is just whatever object_id is being available at the time. Before ORA$BASE edition is being created, 11.1.0.7 creates one more object (compared to 11.1.0.6), index I_SYN2, which explains advance in edition_id.

However, the idea seems to be interesting anyway...

Saturday, January 31, 2009

Edition-based redifinition (a speculation)

As you can see from this link, there is a feature called Edition-based redefinition mentioned in the 11GR1 documentation. Unfortunately, all it says is that this feature is unavailable as of now so there is really not much you can tell about it.

A speculation

I'm usually not a great fun of doing any sort of a guesswork, however, in this case it might be an interesting exercise to observe some bits and pieces presented in the current release of 11GR1 in order to see whether it can give us some clues as to what expect from this new feature as well as what the potential underpinning might look like. As well as prepare yourself for some implications...

SYS.OBJ$

Perhaps this will be the first thing what you'll notice as soon as 11G's data dictionary is concerned. For example, this is how dba_synonyms view is defined in 10GR2:
create or replace view dba_synonyms
(owner, synonym_name, table_owner, table_name, db_link)
as
select u.name, o.name, s.owner, s.name, s.node
from sys.user$ u, sys.syn$ s, sys.obj$ o
where o.obj# = s.obj#
and o.type# = 5
and o.owner# = u.user#;
Now, take a look at the same view's definition in 11GR1:
create or replace view dba_synonyms
(owner, synonym_name, table_owner, table_name, db_link)
as
select u.name, o.name, s.owner, s.name, s.node
from sys.user$ u, sys.syn$ s, sys."_CURRENT_EDITION_OBJ" o
where o.obj# = s.obj#
and o.type# = 5
and o.owner# = u.user#;
For you see, the reference to SYS.OBJ$ was replaced with SYS."_CURRENT_EDITION_OBJ". This replacement occurs all around the place in the data dictionary which alone makes it interesting enough to attract a bit of attention.

SYS._CURRENT_EDITION_OBJ

What is _CURRENT_EDITION_OBJ? It's a view:
SQL> select object_type
2 from dba_objects
3 where object_name='_CURRENT_EDITION_OBJ';

OBJECT_TYPE
-------------------
VIEW
Let's take a look at this view's definition (I've omitted fields list for the sake of clarity):
select ...
from obj$ o, user$ u
where o.owner# = u.user#
and ( /* non-versionable object */
( o.type# not in (4,5,7,8,9,10,11,12,13,14,22,87,88)
or bitand(u.spare1, 16) = 0)
/* versionable object visible in current edition */
or ( o.type# in (4,5,7,8,9,10,11,12,13,14,22,87)
and ( (u.type# <> 2 and
sys_context('userenv', 'current_edition_name') = 'ORA$BASE')
or (u.type# = 2 and
u.spare2 = sys_context('userenv', 'current_edition_id'))
or exists (select 1 from obj$ o2, user$ u2
where o2.type# = 88
and o2.dataobj# = o.obj#
and o2.owner# = u2.user#
and u2.type# = 2
and u2.spare2 =
sys_context('userenv', 'current_edition_id'))
)
)
);
First of all, it mentions about (A) versionable objects and (B) current edition which is referenced through session's context sys_context('userenv', 'current_edition_id').

Versionable objects

Only the objects of the following types seems to be versionable:
  • VIEW
  • SYNONYM
  • PROCEDURE
  • FUNCTION
  • PACKAGE
  • PACKAGE BODY
  • TRIGGER
  • TYPE
  • TYPE BODY
  • LIBRARY
  • ASSEMBLY
Object type# = 88 seems to be something special as I couldn't find what it is from sql.bsq (nor dba_objects) and it is being referenced using somewhat special way: if obj# matches dataobj# of some entry with type# = 88 and this entry is part if the current edition, the object is considered to be a part of the current edition as well. Note that the join itself permits this special object to exist in some other schema, though I don't know whether this has any practical meaning or not.

New userenv attributes

There are two (at least) new userenv context attributes:
SQL> select sys_context('userenv', 'current_edition_name')
2 current_edition_name from dual;

CURRENT_EDITION_NAME
--------------------------------------------------------------------------------
ORA$BASE
SQL> select sys_context('userenv', 'current_edition_id')
2 current_edition_id from dual;

CURRENT_EDITION_ID
--------------------------------------------------------------------------------
100
Changing these will change data returned by _CURRENT_EDITION_OBJ view as well and, given that that view is referenced instead of obj$, it looks like these contexts will be the primary mechanism responsible for referencing one version (edition) of the object or the other.

New user type

Again, by looking at the view definition, we may spot that versionable objects are being concerned as far as user$.type# = 2. To remind you, 0 is a role and 1 is a regular user, so there will be something new, perhaps this new feature could be enabled on a per-user level and/or versioned objects will be owned by some other "ghost" schema (my comment about object with type 88 seems to be allowing this, at least technically).

Some interesting clues might be get from _CURRENT_EDITION_OBJ's field definition itself:
when (u.type# = 2) then
(select eo.name from obj$ eo where eo.obj# = u.spare2)
else
'ORA$BASE'
end
The objects which are owned by this new user type will have editition_id as their object_id and will be named after edition name. That also means that object_id is no longer unique:
create unique index i_obj1 on obj$(obj#, owner#, type#) (11GR1)
create unique index i_obj1 on obj$(obj#) (10GR2)
(from sql.bsq)
I wander whether this have some chances to break legacy code which references sys.obj$ by obj# and expect one row to be returned at most...

Does all that, again, points out that object versions will be kept under this special user and will be linked back using data_object_id? Is this also a reason why objects which do have "proper" data_object_id are not versionable?

A speculation (again)

Pleases note that everything said in this post is based on some (random) observations of preliminary data available in the current release (11.1.0.7). There are chances that some (if not all) of this data may became altered or even completely invalid when this feature will be officially announced.

However, one thing seems to be clear -- in order to support this new feature, some serious alterations of the data dictionary are taking place and you should keep yourself alerted...

Create database or who wants some DMT?

Simpler than ever

Starting from Oracle 10G, creating the database can be as simple as this:
SQL> create database;

Database created.
It's not a surprise that this feature was somewhat advertised here and there. However, what was missing in these advertisements is this:
SQL> select name, decode(bitmapped, 0, 'DMT', 'LMT')
from ts$
order by name; 2 3

NAME DEC
------------------------------ ---
SYSAUX LMT
SYSTEM DMT
SYS_UNDOTS LMT
For you see, create database will make your SYSTEM tablespace to be dictionary managed by default. I don't really know if there are any reasons for this and since everything else will default to LMT, this should not be a big deal for most of you anyway. Just don't forget that SYSTEM hosts objects like AUD$ (audit log), FGA_LOG$ (fine-grained audit log) or NCOMP_DLL$ (natively compiled objects) which could grow to a fairly large number of extents.

I was a bit surprised watching this relic appear even when you do this in 11.1.0.7...

Friday, January 30, 2009

Moving a datafile

Sometimes you need to move a datafile into a different mount point or ASM diskgroup. This could make you wandering what technique you can use in order to minimize downtime. I'll show you one of my favorite methods which works well under certain circumstances.

Let's say you want to move the following datafile...
SQL> select file_id, file_name
2 from dba_data_files
3 where tablespace_name='USERS';

FILE_ID FILE_NAME
------- --------------------------------------------------------
4 /u01/oradata/ORA11GR1/datafile/o1_mf_users_4q759m64_.dbf
...into mount point /u02.

Backup as copy

The first thing we need to do is backup this datafile as copy using RMAN:
[oracle@ora11gr1a ~]$ rman

Recovery Manager: Release 11.1.0.7.0 - Production on Thu Jan 29 17:29:59 2009

Copyright (c) 1982, 2007, Oracle. All rights reserved.

RMAN> connect target;

connected to target database: ORA11GR1 (DBID=3707369966)

RMAN> backup as copy datafile 4
2> format '/u02/oradata/ORA11GR1/datafile/users01.dbf';

Starting backup at 29-JAN-09
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile copy
input datafile file number=00004 name=/u01/oradata/ORA11GR1/datafile/o1_mf_users
_4q759m64_.dbf
output file name=/u02/oradata/ORA11GR1/datafile/users01.dbf tag=TAG20090129T1825
32 RECID=14 STAMP=677442334
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 29-JAN-09
Rollforward image copy

Since switching to datafile copy will require datafile recover, it might be a good idea to rollforward this image copy first, in order to bring it up to date...
RMAN> list copy of datafile 4;

List of Datafile Copies
=======================

Key File S Completion Time Ckp SCN Ckp Time
------- ---- - --------------- ---------- ---------------
14 4 A 29-JAN-09 2033489 29-JAN-09
Name: /u02/oradata/ORA11GR1/datafile/users01.dbf
Tag: TAG20090129T182532

RMAN> backup incremental from scn 2033489 datafile 4 format '/u02/oradata/ORA11G
R1/datafile/%U';

Starting backup at 29-JAN-09

using channel ORA_DISK_1
backup will be obsolete on date 05-FEB-09
archived logs will not be kept or backed up
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00004 name=/u01/oradata/ORA11GR1/datafile/o1_mf_users
_4q759m64_.dbf
channel ORA_DISK_1: starting piece 1 at 29-JAN-09
channel ORA_DISK_1: finished piece 1 at 29-JAN-09
piece handle=/u02/oradata/ORA11GR1/datafile/1nk61t1d_1_1 tag=TAG20090129T183004
comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01

using channel ORA_DISK_1
backup will be obsolete on date 05-FEB-09
archived logs will not be kept or backed up
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
channel ORA_DISK_1: starting piece 1 at 29-JAN-09
channel ORA_DISK_1: finished piece 1 at 29-JAN-09
piece handle=/u02/oradata/ORA11GR1/datafile/1ok61t1e_1_1 tag=TAG20090129T183004
comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 29-JAN-09

RMAN> recover copy of datafile 4;

Starting recover at 29-JAN-09
using channel ORA_DISK_1
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile copies to recover
recovering datafile copy file number=00004 name=/u02/oradata/ORA11GR1/datafile/u
sers01.dbf
channel ORA_DISK_1: reading from backup piece /u02/oradata/ORA11GR1/datafile/1nk
61t1d_1_1
channel ORA_DISK_1: piece handle=/u02/oradata/ORA11GR1/datafile/1nk61t1d_1_1 tag
=TAG20090129T183004
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished recover at 29-JAN-09

RMAN> list copy of datafile 4;

List of Datafile Copies
=======================

Key File S Completion Time Ckp SCN Ckp Time
------- ---- - --------------- ---------- ---------------
15 4 A 29-JAN-09 2033633 29-JAN-09
Name: /u02/oradata/ORA11GR1/datafile/users01.dbf
Tag: TAG20090129T182532
Note that image copy's checkpoint SCN has moved forward. Keep in mind that this step generally makes sense only if you have block change tracking enabled and/or there is a huge amount of archivelogs to apply, as it will be a trade-off between creating and applying the incremental backup compared to directly applying all necessarily archivelogs. This step can be done starting from Oracle 10G.

Switch datafile

All we have to do now is execute small RMAN block...
RMAN> run
2> {
3> sql 'alter database datafile 4 offline';
4> switch datafile 4 to datafilecopy '/u02/oradata/ORA11GR1/datafile/users0
1.dbf';
5> recover datafile 4;
6> sql 'alter database datafile 4 online';
7> }

sql statement: alter database datafile 4 offline

datafile 4 switched to datafile copy
input datafile copy RECID=15 STAMP=677442659 file name=/u02/oradata/ORA11GR1/dat
afile/users01.dbf

Starting recover at 29-JAN-09
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished recover at 29-JAN-09

sql statement: alter database datafile 4 online
This is where you'll have some downtime. The amount of downtime depends on how long it will take to recover the datafile which will generally be a function of how many archivelogs needs to be applied which, in turn, can be reduced by using incremental backup. The point is that this step can be really fast.

Don't forget to watch against nologging operations!

Sunday, January 25, 2009

Update and rownum oddity

Take a look at the following table:
SQL> create table codes
2 (
3 code varchar2(10),
4 used number,
5 constraint pk_codes primary key (used, code)
6 ) organization index;

Table created.

SSQL> insert into codes
2 select dbms_random.string('x', 10),
3 case when level <= 5000 then 1 else 0 end
4 from dual
5 connect by level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user, 'codes');

PL/SQL procedure successfully completed.
This table contains a set of codes with used column representing whether the code was already used (1) or not (0). We need to return one (random) unused code from the above table and mark this code as used. This is very easy to archive using the following update statement:
SQL> variable code varchar2(10);
SQL> set autot traceonly
SQL> update codes set used=1
2 where used=0 and rownum=1
3 returning code into :code;

1 row updated.


Execution Plan
----------------------------------------------------------
Plan hash value: 1169687698

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | 13 | 117 (1)| 00:00:02 |
| 1 | UPDATE | CODES | | | | |
|* 2 | COUNT STOPKEY | | | | | |
|* 3 | INDEX FAST FULL SCAN| PK_CODES | 50000 | 634K| 117 (1)| 00:00:02 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(ROWNUM=1)
3 - filter("USED"=0)


Statistics
----------------------------------------------------------
1 recursive calls
5 db block gets
12 consistent gets
0 physical reads
124 redo size
913 bytes sent via SQL*Net to client
874 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
However, the plan is not exactly what I would expect (I'm running this on 11.1.0.7)... Why do IFFS when we can do IRS to get only one row? This is exactly what regular select does, after all:
SQL> select *
2 from codes
3 where used=0 and rownum=1;


Execution Plan
----------------------------------------------------------
Plan hash value: 802332609

------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
|* 2 | INDEX RANGE SCAN| PK_CODES | 1 | 13 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM=1)
2 - access("USED"=0)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
590 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

10053 trace

Let's take a look at 10053 trace output for both select and update statements. I'm picking up relevant sections.

UPDATE:
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: CODES Alias: CODES
#Rows: 100000 #Blks: 423 AvgRowLen: 13.00
Index Stats::
Index: PK_CODES Col#: 2 1
LVLS: 1 #LB: 423 #DK: 100000 LB/K: 1.00 DB/K: 1.00 CLUF: 0.00
Access path analysis for CODES
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for CODES[CODES]
Table: CODES Alias: CODES
Card: Original: 100000.000000 Rounded: 50000 Computed: 50000.00 Non Adjusted: 50000.00
Access Path: index (index (FFS))
Index: PK_CODES
resc_io: 116.00 resc_cpu: 20012369
ix_sel: 0.000000 ix_sel_with_filters: 1.000000
Access Path: index (FFS)
Cost: 116.81 Resp: 116.81 Degree: 1
Cost_io: 116.00 Cost_cpu: 20012369
Resp_io: 116.00 Resp_cpu: 20012369
OPTIMIZER PERCENT INDEX CACHING = 0


Access Path: index (IndexOnly)
Index: PK_CODES
resc_io: 213.00 resc_cpu: 11516867
ix_sel: 0.500000 ix_sel_with_filters: 0.500000
Cost: 213.47 Resp: 213.47 Degree: 1
Best:: AccessPath: IndexFFS
Index: PK_CODES
Cost: 116.81 Degree: 1 Resp: 116.81 Card: 50000.00 Bytes: 0

***************************************
Nothing else is being tried and this is what optimizer selects as the best execution plan. Note how cardinalities are being reported (highlighted in red).

SELECT:

In addition to the above, has one more section:
***************************************
SINGLE TABLE ACCESS PATH (First K Rows)
Single Table Cardinality Estimation for CODES[CODES]
Table: CODES Alias: CODES
Card: Original: 2.000000 Rounded: 1 Computed: 1.00 Non Adjusted: 1.00
Access Path: index (index (FFS))
Index: PK_CODES
resc_io: 2.00 resc_cpu: 7461
ix_sel: 0.000000 ix_sel_with_filters: 1.000000
Access Path: index (FFS)
Cost: 2.00 Resp: 2.00 Degree: 1
Cost_io: 2.00 Cost_cpu: 7461
Resp_io: 2.00 Resp_cpu: 7461
OPTIMIZER PERCENT INDEX CACHING = 0


Access Path: index (IndexOnly)
Index: PK_CODES
resc_io: 2.00 resc_cpu: 14443
ix_sel: 0.500000 ix_sel_with_filters: 0.500000
Cost: 2.00 Resp: 2.00 Degree: 1
Best:: AccessPath: IndexRange
Index: PK_CODES
Cost: 2.00 Degree: 1 Resp: 2.00 Card: 1.00 Bytes: 13

First K Rows: unchanged join prefix len = 1
Join order[1]: CODES[CODES]#0
***********************
Note how cardinalities has changed (highlighted in green) this time. What happened is rownum = 1 predicate resulted in fist_rows(1) mode (highlighted in blue), affecting how cardinalities were calculated.

We know that first_rows(n) hint is being ignored in update and delete statements, thus our update statement always goes in all_rows mode.

You can confirm that select behaves exactly the same way when in all_rows mode:
SQL> select /*+ all_rows */ *
2 from codes
3 where used=0 and rownum=1;


Execution Plan
----------------------------------------------------------
Plan hash value: 2682988822

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 117 (1)| 00:00:02 |
|* 1 | COUNT STOPKEY | | | | | |
|* 2 | INDEX FAST FULL SCAN| PK_CODES | 50000 | 634K| 117 (1)| 00:00:02 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM=1)
2 - filter("USED"=0)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12 consistent gets
0 physical reads
0 redo size
590 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Well, I guess here goes my next wish for CBO improvement regarding how update and delete statements are handled with predicates involving rownum...

Friday, January 23, 2009

11G Managed Recovery Process

MRP process is commonly referenced throughout the web as the process which performs redo apply to your managed standby database. MRP may can in a team with PQ slave (or prnn in 11G) processes in case you start managed recovery in parallel.

Unfortunately, the term performs redo apply seems to be causing some confusion along the way as well. I found it very common that people believes that it's MRP process which performs both reads from the redo streams as well as writes changes into datafiles. For example, sometimes they try to battle slow log apply by increasing managed recovery parallelism without realizing that there is in fact a bit more to the puzzle.

Some details

I'm going to use my 11G DataGuard setup to demonstrate a couple of key points. My setup is operating using real time apply (no parallel), which makes the entire example a bit simpler to demonstrate.

Let's update a row on the source DB:
SQL> update t set n=n;

1 row updated.

SQL> commit;

Commit complete.
Now, take a look at MRP strace output which was produced as a result of the above change:
[oracle@ora11gr1b fd]$ ps -fp 6364
UID PID PPID C STIME TTY TIME CMD
oracle 6364 1 0 19:59 ? 00:00:00 ora_mrp0_ora11gr1
[oracle@ora11gr1b fd]$ strace -e pread,pwrite -p 6364
Process 6364 attached - interrupt to quit
pread(32,..., 512, 45568) = 512
pread(32,..., 1024, 46080) = 1024
pread(30,..., 8192, 2228224) = 8192
pread(30,..., 8192, 259858432) = 8192
pread(31,..., 8192, 9461760) = 8192
I've set filesystemio_options=none so we can observe pread/pwrite syscalls which are easier to follow compared to asynch io_submit/io_getvents system calls (and we don't care about O_DIRECT flag either).

Let's check what are these file descriptors:
[oracle@ora11gr1b fd]$ cd /proc/6364/fd
[oracle@ora11gr1b fd]$ file 30
30: symbolic link to `/u01/oradata/ORA11GR1B/datafile/o1_mf_undotbs1_0fk5fp2c_.dbf'
[oracle@ora11gr1b fd]$ file 31
31: symbolic link to `/u01/oradata/ORA11GR1B/datafile/o1_mf_users_0ik5fp4u_.dbf'
[oracle@ora11gr1b fd]$ file 32
32: symbolic link to `/u01/oradata/ORA11GR1B/onlinelog/o1_mf_9_4qn2rkhk_.log'
In other words, the process read from standby logfile, undo and users (this is where our table is) tablespaces. However, as you might notice, all these calls were reads, we didn't write anything.

From time to time MRP gets a bit more interesting, for example during logfile switches:
...
pwrite(27,..., 16384, 16384) = 16384
pread(27,..., 16384, 16384) = 16384
pread(28,..., 8192, 8192) = 8192
pread(29,..., 8192, 8192) = 8192
pread(30,..., 8192, 8192) = 8192
pread(31,..., 8192, 8192) = 8192
pread(27,..., 16384, 393216) = 16384
pwrite(28,..., 8192, 8192) = 8192
pwrite(29,..., 8192, 8192) = 8192
pwrite(30,..., 8192, 8192) = 8192
pwrite(31,..., 8192, 8192) = 8192
...
Here we actually wrote (27 is a controlfile, 28 and 29 are system and sysaux tablespaces respectively) something. However, from the offset (fourth parameter) you can realize that we are writing to the second block in these datafiles. There is no (can't be) any user data there.

Who is writing the data then?

The first thing you might want to check is, of course, the database writer process:
[oracle@ora11gr1b ~]$ ps -fp 6303
UID PID PPID C STIME TTY TIME CMD
oracle 6303 1 0 19:55 ? 00:00:00 ora_dbw0_ora11gr1
[oracle@ora11gr1b ~]$ strace -e pread,pwrite -p 6303
Process 6303 attached - interrupt to quit
pwrite(23,..., 8192, 2097152) = 8192
pwrite(23,..., 8192, 35987456) = 8192
pwrite(24,..., 8192, 9461760) = 8192
This is the output produced by standby's dbwr right after we updated our table on the source. We wrote two undo blocks (23) and one block in users tablespace (24). By looking at the offset for file descriptor 24 we can confirm that we wrote the table itself:
SQL> select segment_name
2 from dba_extents
3 where tablespace_name='USERS'
4 and 9461760/8192 between block_id and block_id + blocks-1;

SEGMENT_NAME
--------------------------
T
From the above you can confirm that it is DBWR process which wrote the changes for us and it plays crucial role during your standby database operations.

MRP's workload consists mostly from reading the redo stream, datafiles, controlfiles and occasional writes into the controlfile and datafiles header.

If your standby is suffering from the redo apply performance, you may want to pay attention to both MRP and DBWR processes.

Tuesday, January 20, 2009

Latch waits in 11G: queuing

Following up on my previous post about 11G latch waits using semaphore post-wait with a latch holder...

What I was curious to confirm is whether latch holder posts only the first waiter in the list or all of them?

The results

Here are the results I've got using three sessions:
  1. Session 1: gets RC latch
  2. Session 2: misses the latch, begins to sleep
    semop(753664, 0x7fbfffa348, 1)
  3. Session 3: misses the latch, begins to sleep
    semop(753664, 0x7fbfff5f58, 1)
  4. Session 1: completes, posts session 2
    semctl(753664, 28, SETVAL, 0x1)
  5. Session 2: completes, posts session 3
    semctl(753664, 31, SETVAL, 0x1)
Note that though we used the same semaphore set (753664), sessions 1 and 2 posted different semaphores within that set (28 and 31 respectively) and waiters were wakened up by their respective holders according to the order they begun to wait.

sleeps + spin_gets = misses ?

That's an ideal case, of course, as just awakened process might at least have to compete for the same latch with a previous holder if that holder decided to get the same latch again. If awakened process doesn't manage to get the latch during the spin, it'll have to sleep again (thus breaking the above formula). However, the proximity to the above equation might be useful as one of the indicators whether latch is a potential subject to the "new" algorithm or not (this new wait model should tend to produce less sleeps, hence spins, per miss, otherwise there were not much points about it).
SQL> select misses, sleeps, spin_gets, sleeps+spin_gets
2 from v$latch
3 where name='Result Cache: Latch';

MISSES SLEEPS SPIN_GETS SLEEPS+SPIN_GETS
---------- ---------- ---------- ----------------
8208 275 7946 8221
The numbers are pretty close indeed. All this made me a bit more curios and I've decided to take a bit closer look at what happens in the strace then waiter has to sleep more than once in a row.

I've ran two parallel sessions executing a loop which selects from result cache...
begin
for i in 1 .. 1000000
loop
for cur in (select /*+ result_cache */ * from t where n=1)
loop
null;
end loop;
end loop;
end;
...made sure the above formula became distorted further and took a look at strace cutput for one of the sessions which led me to an interesting discovery:
[oracle@ora11gr1a ~]$ strace -e semop,semctl,semtimedop -p 7591
Process 7591 attached - interrupt to quit
...
semctl(1015808, 30, SETVAL, 0x1) = 0
semctl(1015808, 30, SETVAL, 0x1) = 0
semtimedop(1015808, 0x7fbfff42b8, 1, {0, 10000000}) = 0
semtimedop(1015808, 0x7fbfff42b8, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable)
semop(1015808, 0x7fbfff5f58, 1) = 0
...
What we see there is traditional time-based sleep using semtimedop syscall. Does that means that Oracle still can choose between algorithms dynamically? The answer, according to extended SQL trace, seems to be no, as I wasn't able to match these semtimedop syscalls with latch waits in SQL trace (so these syscalls were used for something else).

P.S. It looks like 11G has invalidated all that pseudo code you may find around about latch acquisition algorithm Oracle uses.

Monday, January 19, 2009

"LONGHOLD" latch waits on Linux

Take a look at the figure from my previous blog post:
call     count       cpu    elapsed       disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 5001 0.53 53.71 0 0 0 500000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5003 0.53 53.71 0 0 0 500000

...

Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
latch free 1 53.20 53.20
What's unusual about it is a single latch wait for 53 seconds without consuming any CPU time. This is something different from a usual
get -> miss -> spin -> sleep
latch acquisition algorithm we generally used to.

Sequence of events

To remind you, the sequence of events goes like this:
  1. First session: do select /*+ result_cache */ * from t which places a single big result into Result Cache memory.
  2. First session: do select count(*) from v$result_cache_memory which grabs Result Cache latch for a long amount of time.
  3. Second session: do select /*+ result_cache */ * from t which forces us to wait until Result Cache latch will be freed up by our first session.
Latch statistics

Let's take a look at v$latch right after we execute the above three steps:
SQL> select gets, misses, sleeps
2 from v$latch
3 where name='Result Cache: Latch';

GETS MISSES SLEEPS
---------- ---------- ----------
53494 1 1
As we can see, the session indeed went into a sleep after a single miss. The first thing which may come into your mind is that I'm running this on a single CPU box where we don't spin if we miss a latch (due to _spin_count=1 being default on a single CPU boxes). This is not the case:
SQL> show parameter cpu_count

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cpu_count integer 2
And my DB's _spin_count has a default value of 2000 as well.

Let's look a bit further:
SQL> select location, sleep_count, wtr_slp_count, longhold_count
2 from v$latch_misses
3 where parent_name='Result Cache: Latch'
4 and sleep_count+wtr_slp_count+longhold_count > 0;

LOCATION SLEEP_COUNT WTR_SLP_COUNT LONGHOLD_COUNT
------------------------------ ----------- ------------- --------------
Result Cache: Serialization12 0 1 0
Result Cache: Serialization16 1 0 1
Note LONGHOLD_COUNT column equals 1 for Result Cache: Serialization16. That means that we've holded a latch for the entire duration of someone else's sleep which, again, conforms to our observation. But what is the underlying mechanism for this?

Under the hood

Here is an excerpt from our second session's strace output:
...
getrusage(RUSAGE_SELF, {ru_utime={0, 728889}, ru_stime={1, 317799}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 728889}, ru_stime={1, 317799}, ...}) = 0
semop(98304, 0x7fbfff5f58, 1) = 0 -- enters the sleep here
times({tms_utime=72, tms_stime=131, tms_cutime=0, tms_cstime=0}) = 429496789
getrusage(RUSAGE_SELF, {ru_utime={0, 729889}, ru_stime={1, 317799}, ...}) = 0
...
And the first session:
...
mmap(0x2a972e7000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 9, 0) = 0x2a972e7000
mmap(0x2a972f7000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 9, 0) = 0x2a972f7000
semctl(98304, 31, SETVAL, 0x7f00000001) = 0 -- issues upon completion
getrusage(RUSAGE_SELF, {ru_utime={206, 80670}, ru_stime={2, 686591}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={206, 81670}, ru_stime={2, 686591}, ...}) = 0
...
In other words:
  1. Second session: upon discovering that the latch is busy, it begins to sleep by issuing
    semop(98304, 0x7fbfff5f58, 1) command.
  2. First session: upon completion issues
    semctl(98304, 31, SETVAL, 0x7f00000001) which wakes our second session up.
The first argument is a semid which means that both sessions are operating on the same semaphore. Both semaphore syscalls are not presented if we execute each of the above sessions separately.

Though we do sleep under normal latch acquisition algorithm as well, the sleep is based on time, not the latch holder posting us.

Which algorithm?

The interesting question to answer is how the waiter determines which algorithm to use. It could be (A) a special "longhold" mode get for a latch or (B) the waiter might check for how long the latch was already held. Thus far I tend to think that it is more close to (A) (a special get mode or a separate flag somewhere) because if we execute above two sessions concurrently we always get an enormously unusual number of sleeps even if we place only one single-row result into Result Cache memory (thus making sure there are no significant delays selecting from v$result_cache_memory).

Maybe some of you could share other ideas as well.

Sunday, January 18, 2009

v$result_cache_memory or what is the newest method to hang stuff up?

Sometimes I get quite amused by the "side effects" you can discover while playing with features.

Prerequisites

First, we need to place something relatively big into result cache memory. The easiest way to archive this is start like this:
SQL> alter system set result_cache_max_size=64m;

System altered.

SQL> alter system set result_cache_max_result=100;

System altered.
The first parameter controls how much memory out of your shared pool Oracle can dedicate for result cache. The second parameter tells how much memory, in percent, can be dedicated for a single resultset. Default value is 5 and I've set it to 100 so we don't have to dedicate a lot of memory to demonstrate my point.

All we have to do now is populate the result cache memory:
SQL> create table t as
2 select level n, rpad('*', 100, '*') v
3 from dual
4 connect by level <= 500000;

Table created

SQL> begin
2 for cur in (select /*+ result_cache*/ * from t)
3 loop
4 null;
5 end loop;
6 end;
7 /

PL/SQL procedure successfully completed
What's next?

Let's check how much result cache memory is allocated:
SQL> set timing on
SQL> select count(*) from v$result_cache_memory;

COUNT(*)
----------
65536

Executed in 55.984 seconds
That's almost a minute to get the results, yes.

So what?

There are (nasty) consequences.

It's time to buy Core i7

Let's start by taking a look at the trace file of our session:
SQL ID: 0zudkuq64dkwx
Plan Hash: 600544352
select count(*)
from
v$result_cache_memory


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 54.65 56.96 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 54.65 56.96 0 0 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 38

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=0 pr=0 pw=0 time=0 us)
65536 FIXED TABLE FULL X$QESRCMEM (cr=0 pr=0 pw=0 time=0 us cost=0 size=13 card=1)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
SQL*Net message from client 2 0.00 0.01
First of all, all the time was spent actively consuming CPU. Selecting from X$QESRCMEM became a pretty expensive thing (in fact, at first I confused such a huge delay with a session being simply hang while spinning on the CPU).

Can't use result cache

Any session which attempts to use result cache will be forced to wait until our first session completes:
SQL ID: bzf4nvdav1m2j
Plan Hash: 1601196873
SELECT /*+ result_cache*/ *
FROM
T


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 5001 0.53 53.71 0 0 0 500000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5003 0.53 53.71 0 0 0 500000

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 38 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
500000 RESULT CACHE 49yjv3h37cjhz39gkd90p4413p (cr=0 pr=0 pw=0 time=0 us)
0 TABLE ACCESS FULL T (cr=0 pr=0 pw=0 time=0 us cost=2127 size=30436770 card=468258)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
latch free 1 53.20 53.20
And the latch name is...
WAIT #2: nam='latch free' ela= 53205337 address=1610838648 number=377 tries=0 obj#=-1 tim=1232316789354155

SQL> select name from v$latch where latch#=377;

NAME
----------------------------------------------------------------
Result Cache: Latch
Surprised? I'm not. Don't expect me to start complaining about the absence of shared mode gets again.

The Real Deal or result_cache_mode = force

Things gets really nasty if you set result_cache_mode = force which has a potential outcome of rendering your entire system unusable for some extended period of time. Because even the simplest queries like select * from dual will hang, escalating situation to a point where your normal users will be unable to even connect to a database, let alone executing the queries. For example, trying to connect with SQL*Plus results in the following:
SQL ID: d6vwqbw6r2ffk
Plan Hash: 1388734953
SELECT USER
FROM
DUAL


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 43.40 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 43.41 0 0 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 38

Rows Row Source Operation
------- ---------------------------------------------------
1 RESULT CACHE 4c48ztv5ztq25dkd78360h2uxb (cr=0 pr=0 pw=0 time=0 us)
1 FAST DUAL (cr=0 pr=0 pw=0 time=0 us cost=2 size=0 card=1)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
latch free 1 43.39 43.39
SQL*Net message from client 2 0.00 0.00
I don't have to mention the latch name again.

Real World?

What are the chances of meeting the above situation on some real system? First of all, having a system with more than a 100GB SGA is not uncommon these day. The thing gets worse the more memory is being occupied by a single result. Having a lot of small results seems to be not a problem. If we take 5% as a default threshold to be eligible to be placed into result cache memory, facing the above situation will require 1280M for result cache memory. That looks like quite a big number, however, on the scale of hundred gigs SGA it's no longer seems to be improbable. Aggregating a multi gigabytes of data into 64MB result doesn't sound insane either (at least not to me, you may have a different opinion here).

Plus it takes someone to query v$result_cache_memory. So just don't do it. There are other views (like v$result_cache_objects) which you can use to at least get a part of that info, alas at lower granularity.

Environment

The above tests were verified on Oracle 11GR1 running on Windows XP x64 (native) and Linux x86-64 under VMware (with XP x64 acting as a host OS).

Friday, January 16, 2009

insert /*+ append */ into ... values (...)

Alex Egorov has left a very fascinating comment to my post about 11G adaptive direct path reads which I thought definitely deserves mentioning.

What we are talking about here is a serious behavior change how statements like...
insert /*+ append */ into t values (1)
...are handled in 11G.

Saving yourself from yourself... not anymore

In previous Oracle versions, the append hint in the cases like above was simply ignored for some good reasons. The hint is being obeyed in 11G.

To better elaborate what that means, let's take the following example:
begin
for i in 1 .. 1000
loop
insert /*+ append */ into t values (i);
commit;
end loop;
end;
...and compare how behavior will change in 11G.

First of all, I've seen legacy code like the above in many places where developers were putting append hint everywhere they could because they heard it is like fast=true thing.

After you upgrade, you might expect the following:

  • If you don't do commit after each row -- the code will break itself with ORA-12838: cannot read/modify an object after modifying it in parallel.
  • If you do commit (as in my example) -- you have all chances to grow the table beyond the skies as each direct path insert writes beyond the HWM (in 11G, the above code will grow the table by 1000 blocks each time it is executed).
  • Serious locking issues as each direct path insert has to obtain an exclusive lock on the table first.

The above might produce quite an unpleasant surprises for some of you...

Thursday, January 15, 2009

Hot backup mode and a little known fact

Most people are aware of the implication hot backup puts on your redo stream. They'll tell you that on first modification Oracle will dump the entire block image into a redo stream and subsequent writes of the block will go as usual.

This is only half of the story.

The example

Let's start by creating a simple table I'll use as an example:
SQL> create table t pctfree 99 pctused 1 as
2 select level n
3 from dual
4 connect by level <= 100;

Table created.
Now, let's measure the amount of redo generated if I'm going to update all rows in the table:
SQL> set autot traceonly stat
SQL> update t set n=n;

100 rows updated.


Statistics
----------------------------------------------------------
30 recursive calls
20 db block gets
44 consistent gets
18 physical reads
12152 redo size
826 bytes sent via SQL*Net to client
769 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
100 rows processed
That's almost 12K of redo. Let's see what happens if I put tablespace into a hot backup mode:
SQL> alter tablespace users begin backup;

Tablespace altered.

SQL> update t set n=n;

100 rows updated.


Statistics
----------------------------------------------------------
0 recursive calls
20 db block gets
20 consistent gets
0 physical reads
152708 redo size
830 bytes sent via SQL*Net to client
769 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
100 rows processed
The amount of redo grew up to almost 150K. Oracle has to dump the entire block image into a redo stream due to fractured blocks problem. If we do the same update second time:
SQL> update t set n=n;

100 rows updated.


Statistics
----------------------------------------------------------
0 recursive calls
17 db block gets
20 consistent gets
0 physical reads
11980 redo size
830 bytes sent via SQL*Net to client
769 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
100 rows processed
It goes as usual. In fact, and this is kind of amazing, this is what most of the sources available on the internet will tell you about the behavior: the full image of the block gets written only on first modification in order to prevent fractured blocks problem and subsequent modifications will go as usuals.

Only half of the story

Let me continue with my example:
SQL> alter system flush buffer_cache;

System altered.

SQL> update t set n=n;

100 rows updated.


Statistics
----------------------------------------------------------
0 recursive calls
20 db block gets
20 consistent gets
20 physical reads
151916 redo size
831 bytes sent via SQL*Net to client
769 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
100 rows processed
What you see here is the entire image of the blocks written into a redo stream again.

The true story

Corrected 27-Jan-2009: initially I thought that it is a write to disk what matters, however, thanks to Jonathan Lewis for correcting me -- it's a read from disk to buffer. Please see his first comment.

Don't confuse correlation with causation.

Wednesday, January 14, 2009

Consistent gets from cache (fastpath)

There is an intresting optimization introduced in 11G regarding the way blocks are accessed in the buffer cache.

Let's start with a simple example to demonstrate my point.

10GR2 example
SQL> create table t (
2 n number,
3 v varchar2(100),
4 constraint pk_n primary key (n)
5 );

Table created

SQL>
SQL> insert into t
2 select level, rpad('*', 100, '*')
3 from dual
4 connect by level <= 1000;

1000 rows inserted

SQL> commit;

Commit complete
I'm going to create the procedure to return number of consistent gets in my session:
SQL> create or replace procedure get_cg(
2 p_cg out number
3 ) is
4 begin
5 select ms.value end into p_cg
6 from v$mystat ms, v$statname sn
7 where ms.STATISTIC#=sn.STATISTIC#
8 and sn.NAME='consistent gets';
9 end get_cg;
10 /

Procedure created
Let's execute a simple select and measure the number of consistent gets:
SQL> declare
2 l_cg_b number;
3 l_cg_a number;
4 begin
5 get_cg(l_cg_b);
6 for cur in (select n from (select mod(level, 1000)+1 l from dual connect by
level <= 100000) l, t where t.n=l.l)
7 loop
8 null;
9 end loop;
10 get_cg(l_cg_a);
11 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
12 end;
13 /

consistent gets: 100012

PL/SQL procedure successfully completed
This is somewhat expected number, we did an index lookup (a unique scan) for all 100K rows returned from our connect by query.

11GR1 example

Let's do the same example in 11G. I've slightly modified the procedure to return two statistics -- consistent gets and consistent gets from cache (fastpath):
SQL> create or replace procedure get_cg(
2 p_cg out number,
3 p_cgfp out number
4 ) is
5 begin
6 select max(case sn.NAME when 'consistent gets' then ms.value end),
7 max(case sn.NAME when 'consistent gets from cache (fastpath)'
then ms.value end)
8 into p_cg, p_cgfp
9 from v$mystat ms, v$statname sn
10 where ms.STATISTIC#=sn.STATISTIC#
11 and sn.NAME in ('consistent gets', 'consistent gets from
cache (fastpath)');
12 end get_cg;
13 /

Procedure created

SQL> declare
2 l_cg_b number;
3 l_cgfp_b number;
4 l_cg_a number;
5 l_cgfp_a number;
6 begin
7 get_cg(l_cg_b, l_cgfp_b);
8 for cur in (select n from (select mod(level, 1000)+1 l from dual connect by
level <= 100000) l, t where t.n=l.l)
9 loop
10 null;
11 end loop;
12 get_cg(l_cg_a, l_cgfp_a);
13 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
14 dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-l_cgfp_b));
15 end;
16
/
consistent gets: 2602
consistent gets (fastpath): 1400

PL/SQL procedure successfully completed
That's quite a difference! 2602 consistent gets for 11GR1 versus ~100K for 10GR2. Note a new consistent gets (fastpath) statistic appearing in this test case.

A slightly different example

To make you a bit more curious, here is a slightly different example executed on 11GR1:
SQL> declare
2 l_cg_b number;
3 l_cgfp_b number;
4 l_cg_a number;
5 l_cgfp_a number;
6 begin
7 get_cg(l_cg_b, l_cgfp_b);
8 for cur in (select n from (select trunc(dbms_random.value(1, 1000)) l from
dual connect by level <= 100000) l, t where t.n=l.l)
9 loop
10 null;
11 end loop;
12 get_cg(l_cg_a, l_cgfp_a);
13 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
14 dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-l_cgfp_b));
15 end;
16 /
consistent gets: 101998
consistent gets (fastpath): 2934

PL/SQL procedure successfully completed
As soon as we've started reading index blocks in a random fashion, the effect, well, disappeared. Thus far optimization seems to be kicking in only if we repeaditly accessing the same block over and over again.

Another slightly different example

10GR2:
SQL> declare
2 l_cg_b number;
3 l_cg_a number;
4 begin
5 get_cg(l_cg_b);
6 for cur in (select v from (select mod(level, 1000)+1 l from dual connect by
level <= 100000) l, t where t.n=l.l)
7 loop
8 null;
9 end loop;
10 get_cg(l_cg_a);
11 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
12 end;
13 /
consistent gets: 201001

PL/SQL procedure successfully completed
What we are doing here is reading the data from a table itself (not only from index). Again, that's expected number. We had to do an index unique scan followed by table access by rowid, resulting in two LIOs per returned row. Now, take a look what happens in 11GR1:
SQL> declare
2 l_cg_b number;
3 l_cgfp_b number;
4 l_cg_a number;
5 l_cgfp_a number;
6 begin
7 get_cg(l_cg_b, l_cgfp_b);
8 for cur in (select v from (select mod(level, 1000)+1 l from dual connect by
level <= 100000) l, t where t.n=l.l)
9 loop
10 null;
11 end loop;
12 get_cg(l_cg_a, l_cgfp_a);
13 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
14 dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-
l_cgfp_b));
15 end;
16 /
consistent gets: 102602
consistent gets (fastpath): 1400

PL/SQL procedure successfully completed
In other words, index blocks were "neutralized", but table blocks were not. Another point is that optimization still works for index blocks even though we accessing table blocks in between.

Works only for index blocks?

That's really simple to verify using the following example:
SQL> declare
2 l_cg_b number;
3 l_cgfp_b number;
4 l_cg_a number;
5 l_cgfp_a number;
6 begin
7 get_cg(l_cg_b, l_cgfp_b);
8 for cur in (select v from (select case mod(level, 2) when 0 then
'AAADq1AAEAAAAJHAAA' else 'AAADq1AAEAAAAJHAAB' end l from dual connect by level <=
100000) l, t where t.rowid=l.l)
9 loop
10 null;
11 end loop;
12 get_cg(l_cg_a, l_cgfp_a);
13 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
14 dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-
l_cgfp_b));
15 end;
16 /
consistent gets: 1000
consistent gets (fastpath): 1000

PL/SQL procedure successfully completed
The same example in 10GR2, of course, produces ~100K LIOs, so the optimization seems to be kicking in for table blocks as well. Further, the behavior in the previous example can not be explained by table blocks being randomly distributed since the clustering factor is perfect:
SQL> select sum(bn_diff)
2 from (
3 select case when bn!=lag(bn, 1, bn) over (order by n)then 1 else 0
end bn_diff
4 from (select n, dbms_rowid.rowid_block_number(rowid) bn from
t)
5 );

SUM(BN_DIFF)
------------
15

SQL> select count (distinct dbms_rowid.rowid_block_number(rowid)) from t;

COUNT(DISTINCTDBMS_ROWID.ROWID
------------------------------
16
Perhaps, there is some overhead associated with fastpath and the logic assumes a bad clustering factor by default (which works for majority of cases) thus doesn't tries to do a fastpath for table blocks in our previous example.

Moving further

Take a look at this:
SQL> declare
2 l_cg_b number;
3 l_cgfp_b number;
4 l_cg_a number;
5 l_cgfp_a number;
6 begin
7 get_cg(l_cg_b, l_cgfp_b);
8 for cur in (select t1.n n1, t2.n n2 from (select mod(level, 1000)+1 l from
dual connect by level <= 100000) l, t t1, t t2 where t1.n=l.l and t2.n=l.l)
9 loop
10 null;
11 end loop;
12 get_cg(l_cg_a, l_cgfp_a);
13 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
14 dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-
l_cgfp_b));
15 end;
16 /
consistent gets: 5204
consistent gets (fastpath): 2800

PL/SQL procedure successfully completed
I've simply added the same table into a join twice. Though we were accessing exactly the same blocks from the same table, the optimization didn't span both tables.

Doesn't work across fetches
SQL> declare
2 l_cg_b number;
3 l_cgfp_b number;
4 l_cg_a number;
5 l_cgfp_a number;
6 begin
7 get_cg(l_cg_b, l_cgfp_b);
8 for cur in (select n from (select 1 l from dual connect by level <= 100000)
l, t where t.n=l.l)
9 loop
10 null;
11 end loop;
12 get_cg(l_cg_a, l_cgfp_a);
13 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
14 dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-
l_cgfp_b));
15 end;
16 /
consistent gets: 1008
consistent gets (fastpath): 1001

PL/SQL procedure successfully completed
While we were accessing the same row (hence, the block), we got 1K LIOs. It is explained by 11G doing implicit array fetches by 100 rows at a time and also points out at the fact that consistent gets (fastpath) can not operate across fetches (though the same rule applies for "regular" consistent gets as well).

In a nutshell

Overall, this looks like a nice feature. The downside, of course, is less predictability. 11G seems to have introduces these little tricks and features all over the place, which can really help you under certain circumstances but can also make your head spin while trying to figure out for "magical" performance differences...

Added 16-Jan-2009: Take a look at Concurrency post by Jonathan Lewis for some more details.

Monday, January 12, 2009

Airlines...

Reliability considered to be a table stake in the airlines industry. Something you have to have if you want to play the planes game at all.

I had a flight to Toronto a couple of days ago which I can describe as very fascinating and enlightening.

My flight was scheduled at 8AM and I arrived to Ottawa International airport around 7:10AM. After checking in and passing required security checks I was waiting to board my plane. Almost immediately after I begun to wait, airport intercom said:
- If there is anyone willing to board an earlier flight to Toronto, you can do so.
Well, fine. It is better to be in Pearson earlier than waiting in Ottawa. So I boarded a flight which was supposed to take off 25 minutes earlier.

As soon as everyone has boarded, they closed the doors and started to move the plane. When, suddenly, boom! I'm observing a lot of smoke coming from a back of the plane. The plane stops very hard, almost instantly. I can smell partially burned jet fuel. A minute later a fire extinguisher machine comes and stops a few meters from us doing nothing. I'm watching people becoming more and more curious (is the plane on fire? is it time to jump off the plane to avoid being cooked?). The pilot says:
- It seems that we've got some electronics snag here, let me try to reboot the on-board computer to see if that fixes the problem.
Now, that's cool. Ummm... Windows embedded or what? A couple of minutes later pilot comes again:
- It looks like our problem still persist, so I'm going to power off the entire plane and then put the power back on, let's hope it will fix our problem, because usually it does.
If there is a moment when you start thinking whether you should be sitting on the plane at all, it seems to be it. Some people has already stood up and clearly wandering how they can get off the plane. This is really a time when stewardess can help everyone:
- We were trying to figure out if there is a possibility to board the other plane but, unfortunately, all morning flights are extremely busy.
What a perfect argument to keep the plane's doors closed. The plane backs to life as pilots recycled the power...
- It looks like our problem was solved, so we are going to proceed for takeoff...
Why do I have that strange feeling that it would better be not solved so we can board off? Anyway, they are starting to move us again... When, suddenly, boom! I'm observing a lot of smoke coming from a back of the plane. Here we go again, a fire extinguisher machine comes. That's enough, who can show me the exit? The pilot:
- It appears that our problem reoccurred again, the brakes applied without anything telling them to apply.
Let me think what would happen had the brakes applied on a runway itself... not a rocket since.
- We are going to call the maintenance to see if they can fix the problem, it should take around 25-30 minutes...
Right, while they are calling the maintenance, I'm watching my flight taking off... Some guy comes in and disappears under the plane.
- It looks like maintenance fixed the problem and after they'll be done with a paperwork, we can proceed for takeoff...
Read the book, as Gordon says, if you don't do scheduled maintenance then the plane will schedule one for you. Trying to save costs? I have no doubts you can make service to be so cheap that no one wants to use it. Well, at least they gonna miss one of their KPIs, arrival on time, so there is a hope someone is going to do something about it...

Saturday, January 10, 2009

Using DBMS_RULE_ADM to evaluate dynamic predicates

The best comments are the ones which inspire you to try and learn something new. I received one of these to my recent blog post about Scaling dynamic SQL. Stas, one of my good friends, asked whether I have tried DBMS_RULE_ADM package for this task. Well, now I did. Stas, you should be careful what you ask next time, you might get it :)

The first thing we need to do is create an evaluation context:
SQL> declare
2 l_vtl sys.re$variable_type_list;
3 begin
4 l_vtl:=sys.re$variable_type_list(
5 sys.re$variable_type('age', 'number', null, null),
6 sys.re$variable_type('balance', 'number', null, null),
7 sys.re$variable_type('birthdate', 'date', null, null),
8 sys.re$variable_type('today', 'date', null, null)
9 );
10
11 dbms_rule_adm.create_evaluation_context(
12 evaluation_context_name => 'adverts_ctx',
13 variable_types => l_vtl
14 );
15 end;
16 /

PL/SQL procedure successfully completed
We can create the rules itself now:
SQL> declare
2 l_rules dbms_sql.Varchar2_Table;
3 begin
4 dbms_rule_adm.create_rule_set(
5 rule_set_name => 'adverts'
6 );
7
8 l_rules(1):=':age between 16 and 18';
9 l_rules(2):=':birthdate = :today or :balance > 1000';
10 l_rules(3):=':balance between 100 and 200 and :age > 18';
11
12 for i in 1 .. 3
13 loop
14 dbms_rule_adm.create_rule(
15 rule_name => 'rule_'||to_char(i),
16 condition => l_rules(i),
17 evaluation_context => 'adverts_ctx'
18 );
19
20 dbms_rule_adm.add_rule(
21 rule_name => 'rule_'||to_char(i),
22 rule_set_name => 'adverts'
23 );
24 end loop;
25 end;
26 /

PL/SQL procedure successfully completed
Since there will be no SQL executed, I've decided to simply measure wall clock time, given that we can obtain latch wait information from the extended SQL trace:
SQL> create table results
2 (
3 sid number,
4 cs number
5 );

Table created
Here is the test procedure itself:
SQL> create or replace procedure test_dra(
2 p_i in number
3 ) is
4 l_age sys.re$variable_value;
5 l_balance sys.re$variable_value;
6 l_birthdate sys.re$variable_value;
7 l_today sys.re$variable_value;
8 l_vvl sys.re$variable_value_list;
9 l_true sys.re$rule_hit_list;
10 l_maybe sys.re$rule_hit_list;
11 l_time number;
12 begin
13 l_age:=sys.re$variable_value('age', anydata.convertnumber(16));
14 l_balance:=sys.re$variable_value('balance', anydata.convertnumber(1500));
15 l_birthdate:=sys.re$variable_value('birthdate', anydata.convertdate(to_date('20090101', 'yyyymmdd')));
16 l_today:=sys.re$variable_value('today', anydata.convertdate(trunc(sysdate)));
17
18 l_vvl:=sys.re$variable_value_list(l_age, l_balance, l_birthdate, l_today);
19
20 l_true:=sys.re$rule_hit_list();
21 l_maybe:=sys.re$rule_hit_list();
22
23 l_time:=dbms_utility.get_time;
24 dbms_monitor.session_trace_enable(waits => true, binds => false);
25
26 for i in 1 .. p_i
27 loop
28 dbms_rule.evaluate(
29 rule_set_name => 'adverts',
30 evaluation_context => 'adverts_ctx',
31 variable_values => l_vvl,
32 stop_on_first_hit => false,
33 true_rules => l_true,
34 maybe_rules => l_maybe
35 );
36 end loop;
37
38 dbms_monitor.session_trace_disable;
39 insert into results values (sys_context('userenv', 'sid'), dbms_utility.get_time-l_time);
40 end test_dra;
41 /

Procedure created
As before, I've used four parallel jobs with 100000 iterations each. Here are the results I've got:
SQL> select * from results;

SID CS
---------- ----------
134 3472
131 3412
133 3205
132 3358
And tkprof of one of the jobs:
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 0 0.00 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 0 0.00 0.00 0 0 0 0

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
library cache: mutex X 769 0.00 0.04
latch free 39 0.00 0.00
Which is, well, about 6.5 times slower given wall clock time compared to the cached results using package from my previous blog post:
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.00 0.00 0 0 0 0
Execute 300000 4.85 4.97 0 0 0 300000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 300003 4.85 4.97 0 0 0 300000

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
library cache: mutex X 209 0.00 0.00
cursor: pin S wait on X 98 0.01 1.04
cursor: pin S 360 0.00 0.00
The interesting thing to note here is that DBMS_RULE_ADM seems to have a better scalability, however, I can't really say at what point it could be justified (if at all) given a pretty hefty wall clock time difference.

Result Cache and simple queries

I don't really want to beat the dead horse here, however, I came through an interesting observation today.

It turned out that the latest version of APEX (3.1.2, or maybe some oldest versions as well, don't have these to check out) does use result_cache hint for a queries like this:
SELECT /*+ result_cache */ MESSAGE_TEXT
FROM WWV_FLOW_MESSAGES$
WHERE SECURITY_GROUP_ID = 10
AND MESSAGE_LANGUAGE = :B2
AND NAME = UPPER(:B1 )

SELECT /*+ result_cache */ NLS_LANGUAGE, NLS_TERRITORY, NLS_SORT, NLS_WINDOWS_CHARSET
FROM WWV_FLOW_LANGUAGES
WHERE LANG_ID_UPPER = UPPER(:B1 )
Both of these queries are simple lookups...