The issue had been related to a business-critical stored procedure that, for some reason, was performing almost 50% slower when run in 11gR2 compared to 9iR2. When I took a look at the procedure I've discovered that there were, strictly speaking, nothing that could regress. At least not in the commons sense when we're talking about upgrades. All the procedure was doing was a look up using a statement containing UNION ALL.
Below is the artificially created test case which demonstrates the issue.
9iR2 results:
select * from dual where :"SYS_B_0"=:"SYS_B_1" union all select * from dual where :"SYS_B_2"=:"SYS_B_3" call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 99999 3.88 4.41 0 0 0 0 Execute 99999 1.23 1.31 0 0 0 0 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 199998 5.11 5.73 0 0 0 0
11gR2 results:
select * from dual where :"SYS_B_0"=:"SYS_B_1" union all select * from dual where :"SYS_B_2"=:"SYS_B_3" call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 99999 6.14 7.21 0 0 0 0 Execute 99999 1.04 1.38 0 0 0 0 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 199998 7.19 8.59 0 0 0 0
As you can see from the above figures, 11gR2 spent significantly more time parsing. Given that both sessions had 100% session cursor cache hit (and yes, the procedure didn't make use of bind variables either) and were otherwise run on pretty much identical environments (apart from the Oracle version), such a difference in parse time is certainly unexpected.
It was another time to gear up for a reproducible test case. Perhaps I'm being lucky but it boiled down to a session_cache_cursors setting again. To better demonstrate the point I'm going to run the following anonymous PL/SQL block in my 11gR2 database:
SQL> begin 2 execute immediate 'alter session set cursor_sharing=force'; 3 for i in 1 .. 10 4 loop 5 execute immediate 'alter session set session_cached_cursors='||to_char(i*100); 6 for j in 1 .. 10000 7 loop 8 execute immediate 9 'select /* test_'||to_char(i*100)||' */ * from dual where '||to_char(j)||'='||to_char(j)|| 10 ' union all select * from dual where '||to_char(j)||'='||to_char(j); 11 end loop; 12 end loop; 13 end; 14 / PL/SQL procedure successfully completed
What it does is sets cursor_sharing=force and then executes the same statement with different settings for session_cached_cursors -- from 100 to 1000 (with 100 increment). What do you think the results would be?
SQL> select to_number(substr(sql_text, instr(sql_text, 'test')+5, 4)) cursor_cache, executions, cpu_time 2 from v$sql 3 where sql_text like '%test%' 4 and executions=10000 5 order by 1; CURSOR_CACHE EXECUTIONS CPU_TIME ------------ ---------- ---------- 100 10000 571916 200 10000 517928 300 10000 559908 400 10000 572909 500 10000 561912 600 10000 616915 700 10000 652899 800 10000 670886 900 10000 671892 1000 10000 702893 10 rows selectedSurprisingly enough, rising session_cached_cursors from 100 to 1000 increases the CPU time required to execute the statements from 571916 to 702893! You can also spot how the CPU time increases together with the session_cached_cursors setting.
What happened is 9iR2 was running with session_cached_cursors=1000 and, apparently, it never had any negative effect... until the setting was transferred to 11gR2. It's somewhat hard to speculate what exactly is happening but based on my limited test cases it appears that you need (a) a union all in the statement, (b) run with cursor_sharing=force and (c) have literals in your query. Under these circumstances there appears to be that strange performance degradation observed in 11gR2. But I need to do more investigation on that one. For now all I can say is that, after setting session_cached_cursors=100, 11gR2 exhibited the same results as 9iR2.
***** UPDATE 07-Feb-2012 *****
It appears that only 11.2.0.1 and 11.2.0.2 are affected by this issue (see comments for more details).
***** UPDATE 07-Feb-2012 #2 *****
I've found the bug# for this one: "Bug 12345980 high parse time with cursor_sharing=force when session_cached_cursors set".
Alex,
ReplyDeleteI've just run your code on 9.2.0.8 and 11.2.0.3 on Windows 32 bit. There is a significant difference between the 9i and 11g total times, but
a) the steady increase in time with session_cached_cursors is not visible
b) most of the time difference appears to be in the cursor setup - as indicated by running the code with 100 executions instead of 10,000
Jonathan,
ReplyDeletecan you share the times you observed between 9i and 11g? My test box is running 11.2.0.2 on Linux x64 and customer's system were running 11.2.0.2 on Solaris SPARC. On both systems 9i exhibited no difference depending on number of cached cursors but 11g was very sensitive. I'll try to test with on 11.2.0.3 and see what happens as well.
A couple of thoughts that might provide an opportunity for a bit more discussion and a couple of additional tests:
ReplyDelete* Prior to Oracle 9.2.0.5 the OPEN_CURSORS parameter controlled the number of cursors that could be held open in the session cursor cache for PL/SQL, as of 9.2.0.5 the SESSION_CACHED_CURSORS parameter started controlling the number of PL/SQL cursors that were held open in the session cursor cache (see page 89 in "Oracle Core", also mentioned in Tom Kyte's book if I remember correctly). Specifically, which patch level of 9iR2 were you ?
* FAST DUAL implementation in 10.1 - could that affect the results?
* The potential for adaptive cursor sharing - could that affect the results (disabling the feature did not seem to have much of an impact in my testing)
* What is the value of the STATISTICS_LEVEL parameter?
* The resolution of the CPU clock timer in the operating system might impact the reliability of the CPU statistics. (non-virtual) Windows systems seem to use CPU clock timer granularity values in the range of 15.600 ms to 15.625 ms, AIX might use a CPU clock timer granularity of 10 ms, and some Linux distributions seem to use a CPU clock timer granularity of 1 ms (see http://hoopercharles.wordpress.com/2011/04/14/brain-teaser-10046-extended-sql-trace-shows-exec-435118472c15600e510-how-is-that-possible/ )
In general, it does appear that the increasing value of SESSION_CACHED_CURSORS has some impact on the CPU usage of this single SQL statement. In the test it made no difference in the test results if the OPEN_CURSORS parameter was set at 300 or 1,000.
Alex:
ReplyDeleteI've had two problems replying - when using my wordpress single sign I can't get the note to preview or publish this time around, and the "pre" and "code" tags are rejected anyway so I can't show you a tidy output.
9.2.0.8 - 10,000 cycles:
The clock granularity is 15,625 microseconds - which you can see in the times varying plus or minus a tick. I found that over repeated tests there was often one test (not always the same one)which was much slower than the rest.
CURSOR_CACHE EXECUTIONS CPU_TIME
------------ ---------- ----------
100 10000 312500
200 10000 265625
300 10000 828125
400 10000 281250
500 10000 281250
600 10000 343750
700 10000 296875
800 10000 281250
900 10000 296875
1000 10000 281250
11.2.0.3 - 10,000 cycles
On average significantly slower - but no trend with session_cached_cursors.
CURSOR_CACHE EXECUTIONS CPU_TIME
------------ ---------- ----------
100 10000 593750
200 10000 406250
300 10000 500000
400 10000 578125
500 10000 500000
600 10000 421875
700 10000 437500
800 10000 500000
900 10000 500000
1000 10000 484375
11.2.0.3 - 100 cycles, showing a large fraction of the time from the 10,000 cycles
CURSOR_CACHE EXECUTIONS CPU_TIME
------------ ---------- ----------
100 100 515625
200 100 421875
300 100 437500
400 100 421875
500 100 421875
600 100 421875
700 100 421875
800 100 437500
900 100 421875
1000 100 437500
It's interesting to note that in both the 11.2.0.3 cases the very first pass produced the highest times - but I assume this was a side effect of flushing the library cache to repeat the tests.
Regards
Jonathan Lewis
I made a slight adjustment to the original script, allowing the PL/SQL script to calculate the CPU usage (in 1/100 of a second) for each change of the SESSION_CACHED_CURSORS value (this portion of the script is repeated 3 times):
ReplyDeletealter system flush shared_pool;
DECLARE
CPUStart NUMBER;
begin
execute immediate 'alter session set cursor_sharing=force';
for i in 1 .. 10
loop
execute immediate 'alter session set session_cached_cursors='||to_char(i*100);
CPUStart := DBMS_UTILITY.GET_CPU_TIME;
for j in 1 .. 10000
loop
execute immediate
'select /* test_'||to_char(i*100)||' */ * from dual where '||to_char(j)||'='||to_char(j)||
' union all select * from dual where '||to_char(j)||'='||to_char(j);
end loop;
DBMS_OUTPUT.PUT_LINE(to_char(i*100) || ' ' || TO_CHAR(DBMS_UTILITY.GET_CPU_TIME - CPUStart));
end loop;
end;
/
select to_number(substr(sql_text, instr(sql_text, 'test')+5, 4)) cursor_cache, executions, cpu_time, ELAPSED_TIME
from v$sql
where sql_text like '%test%'
and executions=10000
order by 1;
I tested the script as a normal user and as SYS (no automatic bind variable substitution) in 11.1.0.6 and 11.2.0.1 on 64 bit Linux, and 10.2.0.5 and 11.2.0.2 on 64 bit Windows. The CPU usage does not seem to increase when the SESSION_CACHED_CURSORS value increases on 10.2.0.5 and 11.1.0.6. The CPU usage does seem to increase on 11.2.0.1 and 11.2.0.2.
10.2.0.5 (Windows):
100 32
200 32
300 34
400 31
500 31
600 33
700 33
800 31
900 33
1000 33
CURSOR_CACHE EXECUTIONS CPU_TIME ELAPSED_TIME
------------ ---------- ---------- ------------
100 10000 202801 195792
200 10000 249602 189768
300 10000 202802 201148
400 10000 187201 190446
500 10000 156001 191584
600 10000 140401 194907
700 10000 249601 190389
800 10000 202801 193119
900 10000 249602 193446
1000 10000 202801 189718
11.1.0.6 (Linux):
100 42
200 43
300 42
400 41
500 42
600 42
700 41
800 43
900 42
1000 41
11.2.0.1 (Linux):
100 56
200 60
300 63
400 66
500 68
600 72
700 73
800 76
900 79
1000 81
11.2.0.2 (Windows):
100 63
200 68
300 69
400 72
500 73
600 78
700 83
800 82
900 84
1000 89
CURSOR_CACHE EXECUTIONS CPU_TIME ELAPSED_TIME
------------ ---------- ---------- ------------
100 10000 275963 262742
200 10000 257959 266516
300 10000 265961 267187
400 10000 274958 269595
500 10000 261969 273130
600 10000 277954 274499
700 10000 270959 274332
800 10000 272956 276603
900 10000 253964 276764
1000 10000 291968 278647
Charles,
ReplyDeletethanks for your tests. You seems to be able to observer the same behavior as I did, at least in 11.2.0.1 and 11.2.0.2. I'll try to get some 11.2.0.3 numbers tonight and see whether I'll observer the same results as Jonathan did, i.e. significantly slower times but no correlation with the number of cached cursors.
Jonathan,
ReplyDeletethe gap between 9i and 11g results in your tests looks troubling to say the least... Perhaps I could have come up with a better example as you pointed out at the timer resolution detail. Alas the real world scenario was first observed by the application testing team who communicated the regression issue to the RDBMS team. The app team looked at the wall clock times and noticed the slowdown. We did confirm over multiple iterations that decreasing session_cached_cursors from 1000 to 100 brought the results on par with 9i timings. That's why I'm getting a bit worried about the results you've got, does these mean that 11.2.0.3 will always be slower in that particular case? I'll try to get my hands on this tonight or in a next couple of days, as time permits.
Frankly I don't really know what to do with the sign-in issue as I don't think I have any control other that. I'll look whether there is a setting to allow html into the comments. Thanks for pointing these issues out.
Jonathan,
ReplyDeleteI've installed 11.2.0.3 and can confirm your findings -- there is no correlation between session_cached_cursors and timings. I wonder whether there has been a bug which got fixed. Performance wise it's on par with 11.2.0.2 and 100 cached cursors. So far only 11.2.0.1 and 11.2.0.2 appear to be affected (might be some latest PSU fixes that as well but need to find bug# first).
I'll see if I can dig out 9.2.0.8 somewhere to compare to.
Jonathan, Charles -- I've found a bug# for this one -- "Bug 12345980 high parse time with cursor_sharing=force when session_cached_cursors set".
ReplyDeleteAlex,
ReplyDeleteNice find locating the bug ID. A search of that bug ID found a couple of related documents that list bugs related to the CURSOR_SHARING and SESSION_CACHED_CURSORS parameters:
Doc ID 94036.1 CURSOR_SHARING
Doc ID 30804.1 SESSION_CACHED_CURSORS
It appears that there are still a couple of unresolved CURSOR_SHARING bugs in 11.2.0.3.