Saturday, February 04, 2012

session_cached_cursors again

Not so long time ago I had to troubleshoot code performance regression issue between 9iR2 and 11gR2.

The issue had been related to a business-critical stored procedure that, for some reason, was performing almost 50% slower when run in 11gR2 compared to 9iR2. When I took a look at the procedure I've discovered that there were, strictly speaking, nothing that could regress. At least not in the commons sense when we're talking about upgrades. All the procedure was doing was a look up using a statement containing UNION ALL.

Below is the artificially created test case which demonstrates the issue.

9iR2 results:

select *
from
 dual where :"SYS_B_0"=:"SYS_B_1" union all select * from dual where
  :"SYS_B_2"=:"SYS_B_3"


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    99999      3.88       4.41          0          0          0           0
Execute  99999      1.23       1.31          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   199998      5.11       5.73          0          0          0           0

11gR2 results:

select *
from
 dual where :"SYS_B_0"=:"SYS_B_1" union all select * from dual where
  :"SYS_B_2"=:"SYS_B_3"


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    99999      6.14       7.21          0          0          0           0
Execute  99999      1.04       1.38          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   199998      7.19       8.59          0          0          0           0

As you can see from the above figures, 11gR2 spent significantly more time parsing. Given that both sessions had 100% session cursor cache hit (and yes, the procedure didn't make use of bind variables either) and were otherwise run on pretty much identical environments (apart from the Oracle version), such a difference in parse time is certainly unexpected.

It was another time to gear up for a reproducible test case. Perhaps I'm being lucky but it boiled down to a session_cache_cursors setting again. To better demonstrate the point I'm going to run the following anonymous PL/SQL block in my 11gR2 database:

SQL> begin
  2   execute immediate 'alter session set cursor_sharing=force';
  3   for i in 1 .. 10
  4   loop
  5    execute immediate 'alter session set session_cached_cursors='||to_char(i*100);
  6    for j in 1 .. 10000
  7    loop
  8     execute immediate
  9      'select /* test_'||to_char(i*100)||' */ * from dual where '||to_char(j)||'='||to_char(j)||
 10      ' union all select * from dual where '||to_char(j)||'='||to_char(j);
 11    end loop;
 12   end loop;
 13  end;
 14  /
 
PL/SQL procedure successfully completed

What it does is sets cursor_sharing=force and then executes the same statement with different settings for session_cached_cursors -- from 100 to 1000 (with 100 increment). What do you think the results would be?

SQL> select to_number(substr(sql_text, instr(sql_text, 'test')+5, 4)) cursor_cache, executions, cpu_time
  2   from v$sql
  3   where sql_text like '%test%'
  4    and executions=10000
  5   order by 1;
 
CURSOR_CACHE EXECUTIONS   CPU_TIME
------------ ---------- ----------
         100      10000     571916
         200      10000     517928
         300      10000     559908
         400      10000     572909
         500      10000     561912
         600      10000     616915
         700      10000     652899
         800      10000     670886
         900      10000     671892
        1000      10000     702893
 
10 rows selected
Surprisingly enough, rising session_cached_cursors from 100 to 1000 increases the CPU time required to execute the statements from 571916 to 702893! You can also spot how the CPU time increases together with the session_cached_cursors setting.

What happened is 9iR2 was running with session_cached_cursors=1000 and, apparently, it never had any negative effect... until the setting was transferred to 11gR2. It's somewhat hard to speculate what exactly is happening but based on my limited test cases it appears that you need (a) a union all in the statement, (b) run with cursor_sharing=force and (c) have literals in your query. Under these circumstances there appears to be that strange performance degradation observed in 11gR2. But I need to do more investigation on that one. For now all I can say is that, after setting session_cached_cursors=100, 11gR2 exhibited the same results as 9iR2.

***** UPDATE 07-Feb-2012 *****

It appears that only 11.2.0.1 and 11.2.0.2 are affected by this issue (see comments for more details).

***** UPDATE 07-Feb-2012 #2 *****

I've found the bug# for this one: "Bug 12345980 high parse time with cursor_sharing=force when session_cached_cursors set".

10 comments:

  1. Alex,

    I've just run your code on 9.2.0.8 and 11.2.0.3 on Windows 32 bit. There is a significant difference between the 9i and 11g total times, but
    a) the steady increase in time with session_cached_cursors is not visible
    b) most of the time difference appears to be in the cursor setup - as indicated by running the code with 100 executions instead of 10,000

    ReplyDelete
  2. Jonathan,

    can you share the times you observed between 9i and 11g? My test box is running 11.2.0.2 on Linux x64 and customer's system were running 11.2.0.2 on Solaris SPARC. On both systems 9i exhibited no difference depending on number of cached cursors but 11g was very sensitive. I'll try to test with on 11.2.0.3 and see what happens as well.

    ReplyDelete
  3. A couple of thoughts that might provide an opportunity for a bit more discussion and a couple of additional tests:
    * Prior to Oracle 9.2.0.5 the OPEN_CURSORS parameter controlled the number of cursors that could be held open in the session cursor cache for PL/SQL, as of 9.2.0.5 the SESSION_CACHED_CURSORS parameter started controlling the number of PL/SQL cursors that were held open in the session cursor cache (see page 89 in "Oracle Core", also mentioned in Tom Kyte's book if I remember correctly). Specifically, which patch level of 9iR2 were you ?
    * FAST DUAL implementation in 10.1 - could that affect the results?
    * The potential for adaptive cursor sharing - could that affect the results (disabling the feature did not seem to have much of an impact in my testing)
    * What is the value of the STATISTICS_LEVEL parameter?
    * The resolution of the CPU clock timer in the operating system might impact the reliability of the CPU statistics. (non-virtual) Windows systems seem to use CPU clock timer granularity values in the range of 15.600 ms to 15.625 ms, AIX might use a CPU clock timer granularity of 10 ms, and some Linux distributions seem to use a CPU clock timer granularity of 1 ms (see http://hoopercharles.wordpress.com/2011/04/14/brain-teaser-10046-extended-sql-trace-shows-exec-435118472c15600e510-how-is-that-possible/ )

    In general, it does appear that the increasing value of SESSION_CACHED_CURSORS has some impact on the CPU usage of this single SQL statement. In the test it made no difference in the test results if the OPEN_CURSORS parameter was set at 300 or 1,000.

    ReplyDelete
  4. Anonymous2:25 PM

    Alex:

    I've had two problems replying - when using my wordpress single sign I can't get the note to preview or publish this time around, and the "pre" and "code" tags are rejected anyway so I can't show you a tidy output.

    9.2.0.8 - 10,000 cycles:
    The clock granularity is 15,625 microseconds - which you can see in the times varying plus or minus a tick. I found that over repeated tests there was often one test (not always the same one)which was much slower than the rest.


    CURSOR_CACHE EXECUTIONS CPU_TIME
    ------------ ---------- ----------
             100      10000     312500
             200      10000     265625
             300      10000     828125
             400      10000     281250
             500      10000     281250
             600      10000     343750
             700      10000     296875
             800      10000     281250
             900      10000     296875
            1000      10000     281250


    11.2.0.3 - 10,000 cycles
    On average significantly slower - but no trend with session_cached_cursors.

    CURSOR_CACHE EXECUTIONS CPU_TIME
    ------------ ---------- ----------
             100      10000     593750
             200      10000     406250
             300      10000     500000
             400      10000     578125
             500      10000     500000
             600      10000     421875
             700      10000     437500
             800      10000     500000
             900      10000     500000
            1000      10000     484375


    11.2.0.3 - 100 cycles, showing a large fraction of the time from the 10,000 cycles

    CURSOR_CACHE EXECUTIONS CPU_TIME
    ------------ ---------- ----------
             100        100     515625
             200        100     421875
             300        100     437500
             400        100     421875
             500        100     421875
             600        100     421875
             700        100     421875
             800        100     437500
             900        100     421875
            1000        100     437500

    It's interesting to note that in both the 11.2.0.3 cases the very first pass produced the highest times - but I assume this was a side effect of flushing the library cache to repeat the tests.

    Regards
    Jonathan Lewis

    ReplyDelete
  5. I made a slight adjustment to the original script, allowing the PL/SQL script to calculate the CPU usage (in 1/100 of a second) for each change of the SESSION_CACHED_CURSORS value (this portion of the script is repeated 3 times):

    alter system flush shared_pool;

    DECLARE
    CPUStart NUMBER;
    begin
    execute immediate 'alter session set cursor_sharing=force';
    for i in 1 .. 10
    loop
    execute immediate 'alter session set session_cached_cursors='||to_char(i*100);
    CPUStart := DBMS_UTILITY.GET_CPU_TIME;
    for j in 1 .. 10000
    loop
    execute immediate
    'select /* test_'||to_char(i*100)||' */ * from dual where '||to_char(j)||'='||to_char(j)||
    ' union all select * from dual where '||to_char(j)||'='||to_char(j);
    end loop;
    DBMS_OUTPUT.PUT_LINE(to_char(i*100) || ' ' || TO_CHAR(DBMS_UTILITY.GET_CPU_TIME - CPUStart));
    end loop;
    end;
    /

    select to_number(substr(sql_text, instr(sql_text, 'test')+5, 4)) cursor_cache, executions, cpu_time, ELAPSED_TIME
    from v$sql
    where sql_text like '%test%'
    and executions=10000
    order by 1;

    I tested the script as a normal user and as SYS (no automatic bind variable substitution) in 11.1.0.6 and 11.2.0.1 on 64 bit Linux, and 10.2.0.5 and 11.2.0.2 on 64 bit Windows. The CPU usage does not seem to increase when the SESSION_CACHED_CURSORS value increases on 10.2.0.5 and 11.1.0.6. The CPU usage does seem to increase on 11.2.0.1 and 11.2.0.2.

    10.2.0.5 (Windows):
    100 32
    200 32
    300 34
    400 31
    500 31
    600 33
    700 33
    800 31
    900 33
    1000 33

    CURSOR_CACHE EXECUTIONS CPU_TIME ELAPSED_TIME
    ------------ ---------- ---------- ------------
    100 10000 202801 195792
    200 10000 249602 189768
    300 10000 202802 201148
    400 10000 187201 190446
    500 10000 156001 191584
    600 10000 140401 194907
    700 10000 249601 190389
    800 10000 202801 193119
    900 10000 249602 193446
    1000 10000 202801 189718

    11.1.0.6 (Linux):
    100 42
    200 43
    300 42
    400 41
    500 42
    600 42
    700 41
    800 43
    900 42
    1000 41

    11.2.0.1 (Linux):
    100 56
    200 60
    300 63
    400 66
    500 68
    600 72
    700 73
    800 76
    900 79
    1000 81

    11.2.0.2 (Windows):
    100 63
    200 68
    300 69
    400 72
    500 73
    600 78
    700 83
    800 82
    900 84
    1000 89

    CURSOR_CACHE EXECUTIONS CPU_TIME ELAPSED_TIME
    ------------ ---------- ---------- ------------
    100 10000 275963 262742
    200 10000 257959 266516
    300 10000 265961 267187
    400 10000 274958 269595
    500 10000 261969 273130
    600 10000 277954 274499
    700 10000 270959 274332
    800 10000 272956 276603
    900 10000 253964 276764
    1000 10000 291968 278647

    ReplyDelete
  6. Charles,

    thanks for your tests. You seems to be able to observer the same behavior as I did, at least in 11.2.0.1 and 11.2.0.2. I'll try to get some 11.2.0.3 numbers tonight and see whether I'll observer the same results as Jonathan did, i.e. significantly slower times but no correlation with the number of cached cursors.

    ReplyDelete
  7. Jonathan,

    the gap between 9i and 11g results in your tests looks troubling to say the least... Perhaps I could have come up with a better example as you pointed out at the timer resolution detail. Alas the real world scenario was first observed by the application testing team who communicated the regression issue to the RDBMS team. The app team looked at the wall clock times and noticed the slowdown. We did confirm over multiple iterations that decreasing session_cached_cursors from 1000 to 100 brought the results on par with 9i timings. That's why I'm getting a bit worried about the results you've got, does these mean that 11.2.0.3 will always be slower in that particular case? I'll try to get my hands on this tonight or in a next couple of days, as time permits.

    Frankly I don't really know what to do with the sign-in issue as I don't think I have any control other that. I'll look whether there is a setting to allow html into the comments. Thanks for pointing these issues out.

    ReplyDelete
  8. Jonathan,

    I've installed 11.2.0.3 and can confirm your findings -- there is no correlation between session_cached_cursors and timings. I wonder whether there has been a bug which got fixed. Performance wise it's on par with 11.2.0.2 and 100 cached cursors. So far only 11.2.0.1 and 11.2.0.2 appear to be affected (might be some latest PSU fixes that as well but need to find bug# first).

    I'll see if I can dig out 9.2.0.8 somewhere to compare to.

    ReplyDelete
  9. Jonathan, Charles -- I've found a bug# for this one -- "Bug 12345980 high parse time with cursor_sharing=force when session_cached_cursors set".

    ReplyDelete
  10. Alex,

    Nice find locating the bug ID. A search of that bug ID found a couple of related documents that list bugs related to the CURSOR_SHARING and SESSION_CACHED_CURSORS parameters:
    Doc ID 94036.1 CURSOR_SHARING
    Doc ID 30804.1 SESSION_CACHED_CURSORS

    It appears that there are still a couple of unresolved CURSOR_SHARING bugs in 11.2.0.3.

    ReplyDelete