Wednesday, January 14, 2009

Consistent gets from cache (fastpath)

There is an intresting optimization introduced in 11G regarding the way blocks are accessed in the buffer cache.

Let's start with a simple example to demonstrate my point.

10GR2 example
SQL> create table t (
2 n number,
3 v varchar2(100),
4 constraint pk_n primary key (n)
5 );

Table created

SQL>
SQL> insert into t
2 select level, rpad('*', 100, '*')
3 from dual
4 connect by level <= 1000;

1000 rows inserted

SQL> commit;

Commit complete
I'm going to create the procedure to return number of consistent gets in my session:
SQL> create or replace procedure get_cg(
2 p_cg out number
3 ) is
4 begin
5 select ms.value end into p_cg
6 from v$mystat ms, v$statname sn
7 where ms.STATISTIC#=sn.STATISTIC#
8 and sn.NAME='consistent gets';
9 end get_cg;
10 /

Procedure created
Let's execute a simple select and measure the number of consistent gets:
SQL> declare
2 l_cg_b number;
3 l_cg_a number;
4 begin
5 get_cg(l_cg_b);
6 for cur in (select n from (select mod(level, 1000)+1 l from dual connect by
level <= 100000) l, t where t.n=l.l)
7 loop
8 null;
9 end loop;
10 get_cg(l_cg_a);
11 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
12 end;
13 /

consistent gets: 100012

PL/SQL procedure successfully completed
This is somewhat expected number, we did an index lookup (a unique scan) for all 100K rows returned from our connect by query.

11GR1 example

Let's do the same example in 11G. I've slightly modified the procedure to return two statistics -- consistent gets and consistent gets from cache (fastpath):
SQL> create or replace procedure get_cg(
2 p_cg out number,
3 p_cgfp out number
4 ) is
5 begin
6 select max(case sn.NAME when 'consistent gets' then ms.value end),
7 max(case sn.NAME when 'consistent gets from cache (fastpath)'
then ms.value end)
8 into p_cg, p_cgfp
9 from v$mystat ms, v$statname sn
10 where ms.STATISTIC#=sn.STATISTIC#
11 and sn.NAME in ('consistent gets', 'consistent gets from
cache (fastpath)');
12 end get_cg;
13 /

Procedure created

SQL> declare
2 l_cg_b number;
3 l_cgfp_b number;
4 l_cg_a number;
5 l_cgfp_a number;
6 begin
7 get_cg(l_cg_b, l_cgfp_b);
8 for cur in (select n from (select mod(level, 1000)+1 l from dual connect by
level <= 100000) l, t where t.n=l.l)
9 loop
10 null;
11 end loop;
12 get_cg(l_cg_a, l_cgfp_a);
13 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
14 dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-l_cgfp_b));
15 end;
16
/
consistent gets: 2602
consistent gets (fastpath): 1400

PL/SQL procedure successfully completed
That's quite a difference! 2602 consistent gets for 11GR1 versus ~100K for 10GR2. Note a new consistent gets (fastpath) statistic appearing in this test case.

A slightly different example

To make you a bit more curious, here is a slightly different example executed on 11GR1:
SQL> declare
2 l_cg_b number;
3 l_cgfp_b number;
4 l_cg_a number;
5 l_cgfp_a number;
6 begin
7 get_cg(l_cg_b, l_cgfp_b);
8 for cur in (select n from (select trunc(dbms_random.value(1, 1000)) l from
dual connect by level <= 100000) l, t where t.n=l.l)
9 loop
10 null;
11 end loop;
12 get_cg(l_cg_a, l_cgfp_a);
13 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
14 dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-l_cgfp_b));
15 end;
16 /
consistent gets: 101998
consistent gets (fastpath): 2934

PL/SQL procedure successfully completed
As soon as we've started reading index blocks in a random fashion, the effect, well, disappeared. Thus far optimization seems to be kicking in only if we repeaditly accessing the same block over and over again.

Another slightly different example

10GR2:
SQL> declare
2 l_cg_b number;
3 l_cg_a number;
4 begin
5 get_cg(l_cg_b);
6 for cur in (select v from (select mod(level, 1000)+1 l from dual connect by
level <= 100000) l, t where t.n=l.l)
7 loop
8 null;
9 end loop;
10 get_cg(l_cg_a);
11 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
12 end;
13 /
consistent gets: 201001

PL/SQL procedure successfully completed
What we are doing here is reading the data from a table itself (not only from index). Again, that's expected number. We had to do an index unique scan followed by table access by rowid, resulting in two LIOs per returned row. Now, take a look what happens in 11GR1:
SQL> declare
2 l_cg_b number;
3 l_cgfp_b number;
4 l_cg_a number;
5 l_cgfp_a number;
6 begin
7 get_cg(l_cg_b, l_cgfp_b);
8 for cur in (select v from (select mod(level, 1000)+1 l from dual connect by
level <= 100000) l, t where t.n=l.l)
9 loop
10 null;
11 end loop;
12 get_cg(l_cg_a, l_cgfp_a);
13 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
14 dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-
l_cgfp_b));
15 end;
16 /
consistent gets: 102602
consistent gets (fastpath): 1400

PL/SQL procedure successfully completed
In other words, index blocks were "neutralized", but table blocks were not. Another point is that optimization still works for index blocks even though we accessing table blocks in between.

Works only for index blocks?

That's really simple to verify using the following example:
SQL> declare
2 l_cg_b number;
3 l_cgfp_b number;
4 l_cg_a number;
5 l_cgfp_a number;
6 begin
7 get_cg(l_cg_b, l_cgfp_b);
8 for cur in (select v from (select case mod(level, 2) when 0 then
'AAADq1AAEAAAAJHAAA' else 'AAADq1AAEAAAAJHAAB' end l from dual connect by level <=
100000) l, t where t.rowid=l.l)
9 loop
10 null;
11 end loop;
12 get_cg(l_cg_a, l_cgfp_a);
13 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
14 dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-
l_cgfp_b));
15 end;
16 /
consistent gets: 1000
consistent gets (fastpath): 1000

PL/SQL procedure successfully completed
The same example in 10GR2, of course, produces ~100K LIOs, so the optimization seems to be kicking in for table blocks as well. Further, the behavior in the previous example can not be explained by table blocks being randomly distributed since the clustering factor is perfect:
SQL> select sum(bn_diff)
2 from (
3 select case when bn!=lag(bn, 1, bn) over (order by n)then 1 else 0
end bn_diff
4 from (select n, dbms_rowid.rowid_block_number(rowid) bn from
t)
5 );

SUM(BN_DIFF)
------------
15

SQL> select count (distinct dbms_rowid.rowid_block_number(rowid)) from t;

COUNT(DISTINCTDBMS_ROWID.ROWID
------------------------------
16
Perhaps, there is some overhead associated with fastpath and the logic assumes a bad clustering factor by default (which works for majority of cases) thus doesn't tries to do a fastpath for table blocks in our previous example.

Moving further

Take a look at this:
SQL> declare
2 l_cg_b number;
3 l_cgfp_b number;
4 l_cg_a number;
5 l_cgfp_a number;
6 begin
7 get_cg(l_cg_b, l_cgfp_b);
8 for cur in (select t1.n n1, t2.n n2 from (select mod(level, 1000)+1 l from
dual connect by level <= 100000) l, t t1, t t2 where t1.n=l.l and t2.n=l.l)
9 loop
10 null;
11 end loop;
12 get_cg(l_cg_a, l_cgfp_a);
13 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
14 dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-
l_cgfp_b));
15 end;
16 /
consistent gets: 5204
consistent gets (fastpath): 2800

PL/SQL procedure successfully completed
I've simply added the same table into a join twice. Though we were accessing exactly the same blocks from the same table, the optimization didn't span both tables.

Doesn't work across fetches
SQL> declare
2 l_cg_b number;
3 l_cgfp_b number;
4 l_cg_a number;
5 l_cgfp_a number;
6 begin
7 get_cg(l_cg_b, l_cgfp_b);
8 for cur in (select n from (select 1 l from dual connect by level <= 100000)
l, t where t.n=l.l)
9 loop
10 null;
11 end loop;
12 get_cg(l_cg_a, l_cgfp_a);
13 dbms_output.put_line('consistent gets: '||to_char(l_cg_a-l_cg_b));
14 dbms_output.put_line('consistent gets (fastpath): '||to_char(l_cgfp_a-
l_cgfp_b));
15 end;
16 /
consistent gets: 1008
consistent gets (fastpath): 1001

PL/SQL procedure successfully completed
While we were accessing the same row (hence, the block), we got 1K LIOs. It is explained by 11G doing implicit array fetches by 100 rows at a time and also points out at the fact that consistent gets (fastpath) can not operate across fetches (though the same rule applies for "regular" consistent gets as well).

In a nutshell

Overall, this looks like a nice feature. The downside, of course, is less predictability. 11G seems to have introduces these little tricks and features all over the place, which can really help you under certain circumstances but can also make your head spin while trying to figure out for "magical" performance differences...

Added 16-Jan-2009: Take a look at Concurrency post by Jonathan Lewis for some more details.

3 comments:

  1. Alex,

    while re-reading your post, I was wondering about this comment:

    -- quote --
    consistent gets: 5204
    consistent gets (fastpath): 2800

    I've simply added the same table into a join twice. Though we were accessing exactly the same blocks from the same table, the optimization didn't span both tables.
    -- quote --

    If I understand you correctly, you mean to say that it didn't reduce the number of consistent gets to that accessing the index only once. (2600/1400)

    But it still seems to work for each instance of the table/index independently, or am I getting this wrong? Otherwise we would have seen again 100K LIOs for a single instance...

    Regards,
    Randolf

    Oracle related stuff blog:
    http://oracle-randolf.blogspot.com/

    SQLTools++ for Oracle (Open source Oracle GUI for Windows):
    http://www.sqltools-plusplus.org:7676/
    http://sourceforge.net/projects/sqlt-pp/

    ReplyDelete
  2. Randolf,

    you got it right

    if we take a look at the numbers with one table:

    consistent gets: 2602
    consistent gets (fastpath): 1400

    we'll notice that using the same table twice with different aliases doubled the above numbers, i.e. this is what I meant by "it doesn't span"

    ReplyDelete
  3. FYI. The reduction in consistent gets is actually due to a separate internal optimization. The fastpath gets allows those fewer consistent gets to use a shorter code path.

    ReplyDelete