Zone Maps is a new feature that got
officially introduced in 12.1.0.2.0 so I've decided to take a closer look.
From the Oracle Documentation:
For full table access, zone maps allow I/O pruning of data based on the physical location of the data on disk, acting like an anti-index.
Let's start by creating a test table:
SQL> create table t pctfree 95 clustering by linear order (n) as
2 select level n, rpad('x', 200, 'x') v
3 from dual
4 connect by level <= 5000;
Table created
I've used a high
pctfree setting to make sure the table gets spread out on disk -- each row will occupy it's own block:
SQL> select count(*) num_rows,
2 count(distinct dbms_rowid.rowid_block_number(rowid)) num_blocks
3 from t;
NUM_ROWS NUM_BLOCKS
---------- ----------
5000 5000
Zone Maps do not require attribute clustering, however, I did use clustering in this particular case to make sure that value ranges for column
N do not overlap when physically stored on disk to make it easier to see what's happening when a Zone Map is created.
Now let's create a Zone Map:
SQL> create materialized zonemap zm$t on t (n);
Done
The first thing to notice is we now got additional table in our schema with the same name as a Zone Map:
SQL> select * from zm$t order by zone_id$;
ZONE_ID$ MIN_1_N MAX_1_N ZONE_LEVEL$ ZONE_STATE$ ZONE_ROWS$
------------ ---------- ---------- ----------- ----------- ----------
324421046272 1 748 0 0 748
324421046273 749 1756 0 0 1008
324421046274 1757 2764 0 0 1008
324421046275 2765 3772 0 0 1008
324421046276 3773 4780 0 0 1008
324421046277 4781 5000 0 0 220
6 rows selected
As you can see we've got six zones defined with most of them covering a range of about thousand rows with the exception of the first and the last ones. I can now map each ZONE_ID$ to it's respective block range on disk:
SQL> select zone_id$, min_block_id, max_block_id, zone_rows$
2 from (
3 select zm$t.zone_id$,
4 min(dbms_rowid.rowid_block_number(t.rowid)) min_block_id,
5 max(dbms_rowid.rowid_block_number(t.rowid)) max_block_id,
6 max(zone_rows$) zone_rows$
7 from t, zm$t
8 where t.n between zm$t.min_1_n and zm$t.max_1_n
9 group by zm$t.zone_id$
10 ) order by zone_id$;
ZONE_ID$ MIN_BLOCK_ID MAX_BLOCK_ID ZONE_ROWS$
------------ ------------ ------------ ----------
324421046272 179 1023 748
324421046273 1026 2047 1008
324421046274 2050 3071 1008
324421046275 3074 4095 1008
324421046276 4098 5119 1008
324421046277 5122 5343 220
6 rows selected
Based on a couple more tests I've done the algorithm appears to work itself out until the fist block number in a segment which divides evenly by 1024 after which all subsequent blocks get mapped to 8MB regions. The last map has 221 blocks because that's where the end of the last table block happens to be.
Zone Maps seems to be completely independent from the actual extents on disk. In my tests I was able to get multiple Zone Maps pointing at the same extent and in the case above we have multiple extents covered by the same Zone Map. In all cases zones were 8MB in size (more on that a little bit later). By the way if you're wondering why do we have 1008 rows and a little bit strange starting block_ids (i.e. 1026 instead of 1024 and so on) remember that the fist couple blocks are used for first level bmb and do not store actual table data.
So how is this all pieced together?
SYS_OP_ZONE_ID
SYS_OP_ZONE_ID function computes
ZONE_ID$ for a given
ROWID value. The calculation appears to be a straightforward math based on the
ROWID value and assumes 8MB Zone Map chunks -- this explains why Zone Maps are created adjacent to 8MB boundaries since
ZONE_ID$ values in the Zone Map are computed using the same function:
SQL> select rowid, SYS_OP_ZONE_ID(rowid)
2 from t
3 where rownum=1;
ROWID SYS_OP_ZONE_ID(ROWID)
------------------ ---------------------
AAAS4kAAFAAAACzAAA 324421046272
Essentially if we imagine the entire possible database space divided into 8MB regions this function would return into which region a particular
ROWID value would belong.
SYS_ZMAP_FILTER
Let's look at the explain plan for the following query:
SQL> select * from t where n=1;
Execution Plan
----------------------------------------------------------
Plan hash value: 2931408918
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%C
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 205 | 1380
|* 1 | TABLE ACCESS STORAGE FULL WITH ZONEMAP| T | 1 | 205 | 1380
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("N"=1)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_N" > :1 OR zm."MAX_1_N"
< :2) THEN 3 ELSE 2 END END FROM "ROOT"."ZM$T" zm WHERE zm."ZONE_LEVEL$"=0 ORDER BY
zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),1,1)<3 AND "N"=1)
We can see that
SYS_ZMAP_FILTER appears to be the function involved in figuring out which Zone Maps needs to be accessed in order to execute the query. The condition inside the query (zm."MIN_1_N" > :1 OR zm."MAX_1_N" < :2) will be used to eliminate Zone Maps which do not have a value we're looking for and is dynamically constructed based on the predicate(-s) we have in the query. From here
ZONE_ID$ can be mapped back to
ROWID ranges (a reverse operation relative to
SYS_OP_ZONE_ID) in order to scan only required portions of the data on disk.
This looks to be a very exciting feature and I can't help but think that it's a little bit disappointing that it's limited to Exadata storage only as it could have been very useful on other systems due to lack of storage indexes support.