[log volume in one hour] x [number of hours downtime] x .4 = trail disk spaceIt is also described as being a conservative estimate so you are likely to archive a better mileage. What I found is that there are some corner cases which can produce some "anomaly" results.
I'll start with a specially crafted schema and table names as well as data:
create user uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu identified by "u"; User created. SQL> grant resource to uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu; Grant succeeded. SQL> create table uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt 2 ( 3 n number primary key 4 ); Table created. SQL> alter table uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt 2 add supplemental log data (primary key) columns; Table altered.I've just created a new trail file which is currently 974 bytes in size:
[oracle@gg1 dirdat]$ ls -l aa000000 -rw-rw-rw- 1 oracle oinstall 974 Dec 19 11:50 aa000000What I'm going to do is insert some data into a table and then measure how much redo as well as trail data were generated:
SQL> set autot traceonly stat SQL> insert into uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt select power(10,10)+(level-1)*power(10,10) from dual connect by level <= 10000; 2 3 4 10000 rows created. Statistics ---------------------------------------------------------- 0 recursive calls 329 db block gets 31 consistent gets 0 physical reads 364612 redo size 821 bytes sent via SQL*Net to client 917 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 2 sorts (memory) 0 sorts (disk) 10000 rows processed SQL> commit; Commit complete.Let's take a look at the trail file size:
[oracle@gg1 dirdat]$ ls -l aa000000 -rw-rw-rw- 1 oracle oinstall 1619890 Dec 19 13:40 aa000000That is roughly 4.4 times bigger than our redo size and 11 times bigger than Oracle's "conservative" estimate. Your storage provisioning might be in for a surprise.
To understand the key factors which resulted in our trail file to be of such outrageous size let's use logdump and take a look inside:
Logdump 11 >open ./dirdat/aa000000 Current LogTrail is /u01/app/oracle/ggs/dirdat/aa000000 Logdump 12 >ghdr on Logdump 13 >next 2 2010/12/19 13:39:16.632.818 FileHeader Len 966 RBA 0 Name: *FileHeader* 3000 01b6 3000 0008 4747 0d0a 544c 0a0d 3100 0002 | 0...0...GG..TL..1... 0002 3200 0004 2000 0000 3300 0008 02f1 bdfb 2d3e | ..2... ...3.......-> 74f2 3400 0028 0026 7572 693a 6767 313a 7175 6164 | t.4..(.&uri:gg1:quad 726f 3a63 6f6d 3a3a 7530 313a 6170 703a 6f72 6163 | ro:com::u01:app:orac 6c65 3a67 6773 3600 0025 0023 2f75 3031 2f61 7070 | le:ggs6..%.#/u01/app 2f6f 7261 636c 652f 6767 732f 6469 7264 6174 2f61 | /oracle/ggs/dirdat/a 6130 3030 3030 3037 0000 0101 3800 0004 0000 0000 | a0000007....8....... ___________________________________________________________________ Hdr-Ind : E (x45) Partition : . (x04) UndoFlag : . (x00) BeforeAfter: A (x41) RecLength : 19 (x0013) IO Time : 2010/12/19 13:40:32.000.000 IOType : 5 (x05) OrigNode : 255 (xff) TransInd : . (x00) FormatType : R (x52) SyskeyLen : 0 (x00) Incomplete : . (x00) AuditRBA : 45 AuditPos : 37458516 Continued : N (x00) RecCount : 1 (x01) 2010/12/19 13:40:32.000.000 Insert Len 19 RBA 974 Name: UUUUUUUUUUUUUUUUUUUUUUUUUUUUUU.TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT After Image: Partition 4 G b 0000 000f 0000 000b 3130 3030 3030 3030 3030 30 | ........10000000000The first record is a standard trail file header and is of little interest to us. We're going to take look at the second record. First of all, as you can see, we've got a fully qualified table name stored in there. We can see the inserted value as well. Now if all that stuff got stored in plain text that could explain it... let's check it out:
[oracle@gg1 dirdat]$ grep --binary-files=text -o -i \ > uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt \ > aa000000 \ > | wc -l 10000Indeed, we've got a fully qualified table name appearing 10 thousand times on our trail file! The longer your schema and/or your table names are the bigger your trail file will be. The main reason Oracle Streams are using object identifiers (as well as Oracle's redo) is to avoid getting into exactly this kind of trouble. This makes GoldenGate configuration easier as it doesn't have to store the mapping information on the target database but you also have to pay the price with every row modification which gets captured.
The story doesn't end there, however. Let's take a look at the inserted data. How do you think that got stored?
[oracle@gg1 dirdat]$ grep --binary-files=text -o -i \ > 0000000000 \ > aa000000 \ > | wc -l 10000That's right - all numbers got stored in plain text. The fundamental problem here is that trail file does not store data types. Everything just gets converted to strings. This helps dealing with heterogeneous data sources but also makes the storage for certain data types to be very inefficient. Again, you have to pay the price somewhere. If you gzip the above trail file you'll get a whopping 25x compression ratio.
You may ask yourself how column names are being stored? The answer is there are no column names in the trail file, just column positions. That's why GoldenGate requires column order to match between source and target databases, otherwise you'll get your data mapped to a wrong column. Add a lack of data types and you might be in for a very interesting results. If column order doesn't match then you'll have to use defgen to get your data across properly.
In a nutshell, watch out for tables/schemas with long names and lots of numeric data.