Oracle documentation suggests that in order to utilize in-memory PQ, parallel_degree_policy needs to be set to auto.
_parallel_cluster_cache_policy
One of the parameters influenced by parallel_degree_policy is _parallel_cluster_cache_policy. When using Auto DOP _parallel_cluster_cache_policy will be set to cached. The question then becomes what happens if we set _parallel_cluster_cache_policy=cached while still keeping Manual DOP? Will the system use in-memory PQ?
Test table
Below is a test table setup:
SQL> create table z_test tablespace data as select level n, rpad('*', 4000, '*') v from dual connect by level <= 500000; Table created. SQL> alter table z_test add constraint pk_z_test primary key (n); Table altered. SQL> select bytes/power(1024,2) mb from user_segments where segment_name='Z_TEST'; MB ---------- 3968 SQL> exec dbms_stats.gather_table_stats(user, 'z_test'); PL/SQL procedure successfully completed.The instance is running with 12G buffer cache so it'll have no problems fully caching the above table. All tests were done on my in-house test lab with Oracle 11.2.0.3.3 running inside a Linux VM.
Classic PQ #1
Without setting any additional parameters PQ behave the way it always did -- by utilizing direct path reads directly to the process' memory:
SQL> set timing on SQL> set autot trace exp stat SQL> select /*+ parallel(8) full(z_test) */ count(*) from z_test; Elapsed: 00:00:02.86 Execution Plan ---------------------------------------------------------- Plan hash value: 2128527892 -------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 18846 (1)| 00:00:01 | | | | | 1 | SORT AGGREGATE | | 1 | | | | | | | 2 | PX COORDINATOR | | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | | | Q1,00 | P->S | QC (RAND) | | 4 | SORT AGGREGATE | | 1 | | | Q1,00 | PCWP | | | 5 | PX BLOCK ITERATOR | | 500K| 18846 (1)| 00:00:01 | Q1,00 | PCWC | | | 6 | TABLE ACCESS FULL| Z_TEST | 500K| 18846 (1)| 00:00:01 | Q1,00 | PCWP | | -------------------------------------------------------------------------------------------------------- Note ----- - Degree of Parallelism is 8 because of hint Statistics ---------------------------------------------------------- 25 recursive calls 0 db block gets 500525 consistent gets 500000 physical reads 0 redo size 526 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processedIt took us about 2.86 seconds to full scan the table which equals 1387MB/s throughput (my test lab storage setup is described here). The above clearly shows that we had to do physical reads in order to access the entire table.
Caching the table
Of course, before testing the in-memory PQ, we need to make sure that our entire table sits in the buffer cache. The easiest way to do it is perform an FTS on the table using an index:
SQL> select /*+ index(z_test,pk_z_test) */ v from z_test; 500000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 579016438 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 500K| 1907M| 501K (1)| 00:00:03 | | 1 | TABLE ACCESS BY INDEX ROWID| Z_TEST | 500K| 1907M| 501K (1)| 00:00:03 | | 2 | INDEX FULL SCAN | PK_Z_TEST | 500K| | 1052 (1)| 00:00:01 | ----------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 534311 consistent gets 501105 physical reads 0 redo size 2021185355 bytes sent via SQL*Net to client 367187 bytes received via SQL*Net from client 33335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 500000 rows processedLet's check to make sure all table blocks are in the buffer cache:
SQL> set autot off SQL> select count(*) from v$bh where objd= (select data_object_id from user_objects where object_name='Z_TEST') and status='xcur'; COUNT(*) ---------- 500001Now we're good to go!
Classic PQ #2
Even with the table entirely cached we still get it using physical reads when utilizing classic PQ -- as it should be:
SQL> set autot trace exp stat SQL> select /*+ parallel(8) full(z_test) */ count(*) from z_test; Elapsed: 00:00:02.83 Execution Plan ---------------------------------------------------------- Plan hash value: 2128527892 -------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 18846 (1)| 00:00:01 | | | | | 1 | SORT AGGREGATE | | 1 | | | | | | | 2 | PX COORDINATOR | | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | | | Q1,00 | P->S | QC (RAND) | | 4 | SORT AGGREGATE | | 1 | | | Q1,00 | PCWP | | | 5 | PX BLOCK ITERATOR | | 500K| 18846 (1)| 00:00:01 | Q1,00 | PCWC | | | 6 | TABLE ACCESS FULL| Z_TEST | 500K| 18846 (1)| 00:00:01 | Q1,00 | PCWP | | -------------------------------------------------------------------------------------------------------- Note ----- - Degree of Parallelism is 8 because of hint Statistics ---------------------------------------------------------- 24 recursive calls 0 db block gets 500525 consistent gets 500000 physical reads 0 redo size 526 bytes sent via SQL*Net to client 523 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processedIn-memory PQ
Let's flip the parameter responsible for in-memory PQ (while still keeping parallel_degree_policy=manual) and see what happens:
SQL> alter session set "_parallel_cluster_cache_policy"=cached; Session altered. Elapsed: 00:00:00.01 SQL> select /*+ parallel(8) full(z_test) */ count(*) from z_test; Elapsed: 00:00:00.36 Execution Plan ---------------------------------------------------------- Plan hash value: 2128527892 -------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 18846 (1)| 00:00:01 | | | | | 1 | SORT AGGREGATE | | 1 | | | | | | | 2 | PX COORDINATOR | | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | | | Q1,00 | P->S | QC (RAND) | | 4 | SORT AGGREGATE | | 1 | | | Q1,00 | PCWP | | | 5 | PX BLOCK ITERATOR | | 500K| 18846 (1)| 00:00:01 | Q1,00 | PCWC | | | 6 | TABLE ACCESS FULL| Z_TEST | 500K| 18846 (1)| 00:00:01 | Q1,00 | PCWP | | -------------------------------------------------------------------------------------------------------- Note ----- - Degree of Parallelism is 8 because of hint Statistics ---------------------------------------------------------- 24 recursive calls 0 db block gets 502709 consistent gets 0 physical reads 0 redo size 526 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processedFor you see -- the entire table got read from the buffer cache this time and much faster! The fact that we did zero physical IOs shows in-memory PQ kicking in.
Conclusion
It is possible to use in-memory PQ with Manual DOP by setting _parallel_cluster_cache_policy=cached. Of course, always consult with Oracle support before flipping any of the underscore parameters.
No comments:
Post a Comment