HASH GROUP BY vs SORT GROUP BY
Consider the following example:
SQL> create table deptLet's say we want to output department names along with how many employees are there:
2 (
3 dept_id number primary key,
4 dept_name varchar2(100)
5 ) organization index;
Table created
SQL> insert /*+ append */ into dept
2 select level, dbms_random.string('x', 100)
3 from dual
4 connect by level <= 10000;
10000 rows inserted
SQL> create table emp
2 (
3 emp_id number primary key,
4 dept_id references dept (dept_id),
5 emp_name varchar2(100)
6 );
Table created
SQL> insert /*+ append */ into emp
2 select level, trunc(dbms_random.value(1, 10000)), dbms_random.string('x', 100)
3 from dual
4 connect by level <= 100000;
100000 rows inserted
SQL> commit;
Commit complete
SQL> exec dbms_stats.gather_table_stats(user, 'dept');
PL/SQL procedure successfully completed
SQL> exec dbms_stats.gather_table_stats(user, 'emp');
PL/SQL procedure successfully completed
SQL> select /*+ gather_plan_statistics */ count(*)Note that 11G did "Group by Placement" automatically (Jonathan Lewis wrote an article about it) and our query performed 11636 consistent gets.
2 from (
3 select /*+ no_merge */ d.dept_name, count(*) cnt
4 from emp e, dept d
5 where e.dept_id=d.dept_id
6 group by d.dept_name
7 );
COUNT(*)
----------
9999
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID djysxbcmwwxj3, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from ( select /*+
no_merge */ d.dept_name, count(*) cnt from emp e, dept d where
e.dept_id=d.dept_id group by d.dept_name )
Plan hash value: 1432452646
----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.08 | 11636 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.08 | 11636 | | | |
| 2 | VIEW | | 1 | 9999 | 9999 |00:00:00.06 | 11636 | | | |
| 3 | HASH GROUP BY | | 1 | 9999 | 9999 |00:00:00.06 | 11636 | 2058K| 999K| 2497K (0)|
| 4 | NESTED LOOPS | | 1 | 9999 | 9999 |00:00:00.02 | 11636 | | | |
| 5 | VIEW | VW_GBC_10 | 1 | 9999 | 9999 |00:00:00.02 | 1635 | | | |
| 6 | HASH GROUP BY | | 1 | 9999 | 9999 |00:00:00.02 | 1635 | 1207K| 1207K| 2496K (0)|
| 7 | TABLE ACCESS FULL| EMP | 1 | 100K| 100K|00:00:00.01 | 1635 | | | |
|* 8 | INDEX UNIQUE SCAN | SYS_IOT_TOP_15648 | 9999 | 1 | 9999 |00:00:00.01 | 10001 | | | |
----------------------------------------------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
8 - access("ITEM_1"="D"."DEPT_ID")
27 rows selected.
Now, take a look at what happens if we rewrite the query to use a sort group by instead:
SQL> select /*+ gather_plan_statistics */ count(*)What I did there is performed the same transformation myself but replaced hash group by with a sort group by. Note more than 4 times decrease in the amount of consistent gets. By looking at the plan statistics, you can tell why. Although both group by's performed the same amount of consistent gets (1635), it is a nested loops join with dep which produced all the difference -- 10001 for hash group by versus only 989 for a sort group by.
2 from (
3 with e as
4 (
5 select dept_id, count(*) cnt
6 from emp e
7 group by dept_id
8 order by dept_id
9 )
10 select /*+ no_merge */ d.dept_name, e.cnt
11 from e, dept d
12 where e.dept_id=d.dept_id
13 );
COUNT(*)
----------
9999
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
SQL_ID 2utq5vammnwa4, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from ( with e as (
select dept_id, count(*) cnt from emp e group by dept_id order
by dept_id ) select /*+ no_merge */ d.dept_name, e.cnt from e, dept
d where e.dept_id=d.dept_id )
Plan hash value: 2732217545
---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.10 | 2624 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.10 | 2624 | | | |
| 2 | VIEW | | 1 | 9999 | 9999 |00:00:00.08 | 2624 | | | |
| 3 | NESTED LOOPS | | 1 | 9999 | 9999 |00:00:00.08 | 2624 | | | |
| 4 | VIEW | | 1 | 9999 | 9999 |00:00:00.08 | 1635 | | | |
| 5 | SORT GROUP BY | | 1 | 9999 | 9999 |00:00:00.08 | 1635 | 549K| 549K| 487K (0)|
| 6 | TABLE ACCESS FULL| EMP | 1 | 100K| 100K|00:00:00.01 | 1635 | | | |
|* 7 | INDEX UNIQUE SCAN | SYS_IOT_TOP_15648 | 9999 | 1 | 9999 |00:00:00.01 | 989 | | | |
---------------------------------------------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
7 - access("E"."DEPT_ID"="D"."DEPT_ID")
27 rows selected.
Unless you've heard about consistent gets from cache (fastpath) optimization, the above results may produce quite a bit of surprise for you. Note that due to a sorting, which has to be performed by our second query, the first query still performs better. However, in the environments which are wreaking havoc on CBC latches, the additional savings on number of consistent gets may have a potential to alleviate additional sorting expenses. In other words, the lesser are expenses for outer resultset sorting, the more appealing this could be.
Of course, before you even consider this as an optimization opportunity, keep in mind that you're relying on a specific feature which may narrow you down to a specific dot releases (or even patches) as it may change (or even completely disappear) in the next versions.