From the Oracle Documentation:
For full table access, zone maps allow I/O pruning of data based on the physical location of the data on disk, acting like an anti-index.
Let's start by creating a test table:
SQL> create table t pctfree 95 clustering by linear order (n) as 2 select level n, rpad('x', 200, 'x') v 3 from dual 4 connect by level <= 5000; Table createdI've used a high pctfree setting to make sure the table gets spread out on disk -- each row will occupy it's own block:
SQL> select count(*) num_rows, 2 count(distinct dbms_rowid.rowid_block_number(rowid)) num_blocks 3 from t; NUM_ROWS NUM_BLOCKS ---------- ---------- 5000 5000Zone Maps do not require attribute clustering, however, I did use clustering in this particular case to make sure that value ranges for column N do not overlap when physically stored on disk to make it easier to see what's happening when a Zone Map is created.
Now let's create a Zone Map:
SQL> create materialized zonemap zm$t on t (n); DoneThe first thing to notice is we now got additional table in our schema with the same name as a Zone Map:
SQL> select * from zm$t order by zone_id$; ZONE_ID$ MIN_1_N MAX_1_N ZONE_LEVEL$ ZONE_STATE$ ZONE_ROWS$ ------------ ---------- ---------- ----------- ----------- ---------- 324421046272 1 748 0 0 748 324421046273 749 1756 0 0 1008 324421046274 1757 2764 0 0 1008 324421046275 2765 3772 0 0 1008 324421046276 3773 4780 0 0 1008 324421046277 4781 5000 0 0 220 6 rows selectedAs you can see we've got six zones defined with most of them covering a range of about thousand rows with the exception of the first and the last ones. I can now map each ZONE_ID$ to it's respective block range on disk:
SQL> select zone_id$, min_block_id, max_block_id, zone_rows$ 2 from ( 3 select zm$t.zone_id$, 4 min(dbms_rowid.rowid_block_number(t.rowid)) min_block_id, 5 max(dbms_rowid.rowid_block_number(t.rowid)) max_block_id, 6 max(zone_rows$) zone_rows$ 7 from t, zm$t 8 where t.n between zm$t.min_1_n and zm$t.max_1_n 9 group by zm$t.zone_id$ 10 ) order by zone_id$; ZONE_ID$ MIN_BLOCK_ID MAX_BLOCK_ID ZONE_ROWS$ ------------ ------------ ------------ ---------- 324421046272 179 1023 748 324421046273 1026 2047 1008 324421046274 2050 3071 1008 324421046275 3074 4095 1008 324421046276 4098 5119 1008 324421046277 5122 5343 220 6 rows selectedBased on a couple more tests I've done the algorithm appears to work itself out until the fist block number in a segment which divides evenly by 1024 after which all subsequent blocks get mapped to 8MB regions. The last map has 221 blocks because that's where the end of the last table block happens to be.
Zone Maps seems to be completely independent from the actual extents on disk. In my tests I was able to get multiple Zone Maps pointing at the same extent and in the case above we have multiple extents covered by the same Zone Map. In all cases zones were 8MB in size (more on that a little bit later). By the way if you're wondering why do we have 1008 rows and a little bit strange starting block_ids (i.e. 1026 instead of 1024 and so on) remember that the fist couple blocks are used for first level bmb and do not store actual table data.
So how is this all pieced together?
SYS_OP_ZONE_ID
SYS_OP_ZONE_ID function computes ZONE_ID$ for a given ROWID value. The calculation appears to be a straightforward math based on the ROWID value and assumes 8MB Zone Map chunks -- this explains why Zone Maps are created adjacent to 8MB boundaries since ZONE_ID$ values in the Zone Map are computed using the same function:
SQL> select rowid, SYS_OP_ZONE_ID(rowid) 2 from t 3 where rownum=1; ROWID SYS_OP_ZONE_ID(ROWID) ------------------ --------------------- AAAS4kAAFAAAACzAAA 324421046272
Essentially if we imagine the entire possible database space divided into 8MB regions this function would return into which region a particular ROWID value would belong.
SYS_ZMAP_FILTER
Let's look at the explain plan for the following query:
SQL> select * from t where n=1; Execution Plan ---------------------------------------------------------- Plan hash value: 2931408918 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%C -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 205 | 1380 |* 1 | TABLE ACCESS STORAGE FULL WITH ZONEMAP| T | 1 | 205 | 1380 -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("N"=1) filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_N" > :1 OR zm."MAX_1_N" < :2) THEN 3 ELSE 2 END END FROM "ROOT"."ZM$T" zm WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),1,1)<3 AND "N"=1)We can see that SYS_ZMAP_FILTER appears to be the function involved in figuring out which Zone Maps needs to be accessed in order to execute the query. The condition inside the query (zm."MIN_1_N" > :1 OR zm."MAX_1_N" < :2) will be used to eliminate Zone Maps which do not have a value we're looking for and is dynamically constructed based on the predicate(-s) we have in the query. From here ZONE_ID$ can be mapped back to ROWID ranges (a reverse operation relative to SYS_OP_ZONE_ID) in order to scan only required portions of the data on disk.
This looks to be a very exciting feature and I can't help but think that it's a little bit disappointing that it's limited to Exadata storage only as it could have been very useful on other systems due to lack of storage indexes support.
HI Timur,
ReplyDeleteI could not create a Zone Map with any non-Exadata storage as it results in an error:
ORA-31969: ZONEMAP not supported for table stored in tablespace of this storage type
Somebody over at Kerry's blog pointed out that licensing guide mentions that the feature is limited to Exa/SC and here it is:
http://docs.oracle.com/database/121/DBLIC/options.htm#DBLIC152
Zone Maps (Available starting with Oracle Database 12c Release 1 (12.1.0.2); Requires Exadata or Supercluster). There seems to be no technical reason for things to be this way as I verified that Zone Maps work even when storage servers are in block I/O mode.
Good find about the scale parameter! So 2^10 (=1024) is the default Zone Map size.
I was a little bit puzzled how SYS_OP_ZONE_ID would be able to tell which scale was used when creating a Zone Map. In my tests the function could accept an arbitrary ROWID (even the ones which did not belong to any tables in the database or were outside of possible ROWID range for a given database). It turned out SYS_OP_ZONE_ID accepts a second parameter (which I initially did not realize it had) to tell it what scale to use for slicing. During the parse time the scale is looked up and, in case it was non-default one, it will be passed into SYS_OP_ZONE_ID.
ReplyDelete