Saturday, January 03, 2009

11G adaptive direct path reads -- what is the cached/dirty blocks threshold?

11G's ability to do direct path reads during full table scans without utilizing PQ was covered in a number of places already (see this post by Doug Burns for example).

When direct path reads starts to happen?

It is known that somewhat reliable figure is your _small_table_threshold multiplied by 5 (mentioned by Tanel Poder on oracle-l recently). You can discover it using quick and dirty test case similar to this:
SQL> create tablespace adr_test datafile size 64m segment space management manual;

Tablespace created

SQL> create table t (v varchar2(100)) pctused 1 pctfree 99 tablespace adr_test;

Table created

SQL> create or replace function get_adr_trsh(
2 p_step in number,
3 p_start in number default 0,
4 p_stop in number default null
5 ) return number is
6 l_prd number;
7 l_blocks number:=0;
8 l_start number:=p_start;
9 begin
10 execute immediate 'truncate table t';
12 loop
13 insert /*+ append */ into t
14 select rpad('*', 100, '*')
15 from dual
16 connect by level <= p_step + l_start;
17 commit;
19 l_blocks:=l_blocks + p_step + l_start;
20 l_start:=0;
22 execute immediate 'alter system flush buffer_cache';
24 select /*+ full(t) */ count(*) into l_cnt from t;
26 select value into l_prd
27 from v$segment_statistics
28 where owner=user
29 and object_name='T'
30 and statistic_name='physical reads direct';
32 exit when (l_prd > 0 or l_blocks > nvl(p_stop, l_blocks));
34 end loop;
36 return l_blocks - p_step;
37 end;
38 /

Function created
My _small_table_threshold is:
SQL> select ksppstvl
2 from x$ksppi x, x$ksppcv y
3 where (x.indx = y.indx)
4 and ksppinm='_small_table_threshold';

which is about 2% of my buffer cache (128MB) so you may expect 11G switch to direct path reads once table goes beyond 1570 blocks. Let's check it:
SQL> declare
2 l_trsh number;
3 begin
4 l_trsh:=get_adr_trsh(10, 1500, 2000);
6 dbms_output.put_line(l_trsh);
7 end;
8 /


PL/SQL procedure successfully completed
Note that that number is somewhat "about" and you can get different results depending on stuff like using ASSM/MSSM.

What is the cached blocks threshold?

Direct path reads stops happening after certain amount of your table's blocks are in the buffer cache already. Discovering it is fairly easy as well:
SQL> --need this so we can do irs and cache table blocks
SQL> create index i_t on t (1);

Index created

SQL> create or replace function get_cached_trsh(
2 p_start in number default 0,
3 p_step in number default 1
4 ) return number is
5 cursor l_cur is select /*+ index(t i_t) */ * from t;
6 l_v varchar2(100);
7 l_trsh number:=0;
8 l_prd number:=0;
9 l_cnt number:=0;
10 l_start number:=p_start;
11 begin
12 execute immediate 'alter system flush buffer_cache';
13 open l_cur;
15 loop
16 for i in 1 .. p_step+l_start
17 loop
18 fetch l_cur into l_v;
19 end loop;
20 l_trsh:=l_trsh+p_step+l_start;
21 l_start:=0;
23 select /*+ full(t) */ count(*) into l_cnt from t;
25 select value into l_cnt
26 from v$segment_statistics
27 where owner=user
28 and object_name='T'
29 and statistic_name='physical reads direct';
31 exit when l_cnt=l_prd or l_cur%notfound;
33 l_prd:=l_cnt;
35 end loop;
37 close l_cur;
38 return l_trsh;
39 end;
40 /

Function created
Now, we can see after how many blocks 11G will stop doing direct path reads:
SQL> declare
2 l_trsh number;
3 begin
4 l_trsh:=get_cached_trsh(500, 1);
6 dbms_output.put_line(l_trsh);
7 end;
8 /


PL/SQL procedure successfully completed
Which happens to be half of the table's blocks. I've repeated the above test with 256MB buffer cache and got 3140 blocks (number of blocks for direct read to start happening) and 1568 (number of cached blocks) respectively. Please note that cached blocks threshold seems to be not dependent on your buffer cache size (to a degree where it can find space of course).

What is the dirty blocks threshold?

Doing direct path reads requires segment level checkpoint which may not be something you would like to do if you have a lot of these for the sake of direct read alone.

Something we can start with:
SQL> create or replace function get_dirty_trsh(
2 p_step in number,
3 p_start in number default 0,
4 p_stop in number default null
5 ) return number is
6 l_trsh number:=0;
7 l_prd number:=0;
8 l_cnt number:=0;
9 l_start number:=p_start;
10 begin
11 execute immediate 'alter system flush buffer_cache';
13 loop
14 l_trsh:=l_trsh+p_step+l_start;
15 update t set v=v where rownum <= l_trsh;
16 commit;
17 l_start:=0;
19 select /*+ full(t) */ count(*) into l_cnt from t;
21 select value into l_cnt
22 from v$segment_statistics
23 where owner=user
24 and object_name='T'
25 and statistic_name='physical reads direct';
27 exit when l_cnt=l_prd or l_trsh > nvl(p_stop, l_trsh);
29 l_prd:=l_cnt;
31 end loop;
33 return l_trsh;
34 end;
35 /

Function created

SQL> declare
2 l_trsh number;
3 begin
4 l_trsh:=get_dirty_trsh(1, 350, 400);
6 dbms_output.put_line(l_trsh);
7 end;
8 /

PL/SQL procedure successfully completed.
Which turns out to be 1/4 of a table size.

Quick and dirty

Please note that adaptive direct path reads could (and most probably do) have much more variables to make a decision. The above test were done using ad-hoc approach to at least have an idea what could be potential factors there. Things like system statistics, tablespace block sizes, delayed blocks cleanouts, etc. has a potential to interfere over there.

SQL> select * from v$version;

Oracle Database 11g Enterprise Edition Release - 64bit Production
PL/SQL Release - Production
CORE Production
TNS for 64-bit Windows: Version - Production
NLSRTL Version - Production


  1. ha-ha, I have been sitting on the strange behaviour of my new 11g instance from the morning and wondering why Oracle does non parallel full scan with 'direct path read' instead of 'scattered read'. And when I've decided to relax and read your blog, what have I seen? The answer on my question!:)

  2. BTW, it seems like "direct path"-improvements concerned not only queries.

  3. Sorry, false start:)
    So, about "direct path"-improvements:

    PARSING IN CURSOR #4 len=37 dep=1 uid=65 oct=2 lid=65 tim=5210765244335 hv=3962083006 ad='3cff93468' sqlid='c5psj0gq2j5py'
    insert /*+append*/into test values(1)
    PARSE #4:c=20000,e=3371,p=1,cr=16,cu=0,mis=1,r=0,dep=1,og=1,plh=1341712891,tim=5210765244335
    WAIT #4: nam='control file sequential read' ela= 1058 file#=0 block#=1 blocks=1 obj#=-1 tim=5335823633819
    WAIT #4: nam='control file sequential read' ela= 2468 file#=0 block#=16 blocks=1 obj#=-1 tim=5335823636426
    WAIT #4: nam='control file sequential read' ela= 555 file#=0 block#=18 blocks=1 obj#=-1 tim=5335823637061
    WAIT #4: nam='direct path write' ela= 856 file number=5 first dba=2177430 block cnt=1 obj#=-1 tim=5335823638598
    WAIT #4: nam='direct path sync' ela= 987 File number=5 Flags=512 p3=0 obj#=-1 tim=5335823639639
    EXEC #4:c=0,e=0,p=0,cr=0,cu=15,mis=0,r=1,dep=1,og=1,plh=1341712891,tim=5210765244335
    STAT #4 id=1 cnt=1 pid=0 pos=1 obj=0 op='LOAD AS SELECT (cr=0 pr=0 pw=1 time=0 us)'
    STAT #4 id=2 cnt=1 pid=1 pos=1 obj=0 op='BULK BINDS GET (cr=0 pr=0 pw=0 time=0 us)'
    CLOSE #4:c=0,e=0,dep=1,type=3,tim=5210765244335
    CLOSE #2:c=0,e=0,dep=1,type=3,tim=5210765273600

  4. Sasha,

    that's a very fascinating comment!

    I'm not even sure if that's a good change.

    See -- direct path write always write beyond the HWM, what that means is that by executing a loop like this:

    for i in 1 .. 1000
    insert /*+ append */ into t values (i);
    end loop;

    you will end up allocating 1000 blocks above the HWM.

    The over problem is that I've seen a lot of legacy code where developers assumed that "append=fast" and were putting that hint in exactly such case but since it was ignored -- Oracle was kind if saving them from shooting their own leg. Now, they might get surprise after upgrading to 11G...

  5. This is a great explaination, and I still haven't decided to love it , or hate it.. I did get bit by this, where an application was doing 33 simultaneous direct path reads of the same table.. Scattered reads would have shared the blocks,and all would be well. The result of the Direct path Reads, was that each query selected from the disk, and they all eventually overloaded the FA.
    Do you love it or hate it ??

  6. Bryan Grenn2:45 p.m.

    I have been playing with 11gr2 and I am trying to use DBReplay to compare sequential reads with direct path for my application. I've been playing with the event in my session, and I haven't had a luck seeing the direct path stop. Do you know if 11gr2 works differently with this ?

  7. Bryan,

    regarding the "love/hate" relationship -- unfortunately its somewhat hard to talk in absolutes as every feature probably has its own place and there is no doubt you may the situation where a particular feature perform poorly.

    With that being said sometimes I have a sense that some features get "biased" towards the Exadata.

    11GR2 could have changed the algorithm but I'll need to take a closer look first.