Sunday, December 19, 2010

Oracle GoldenGate Trail File Size

When it comes to estimating how big your trail files will be, Oracle documentation suggests to use the following formula:
[log volume in one hour] x [number of hours downtime] x .4 = trail disk space
It is also described as being a conservative estimate so you are likely to archive a better mileage. What I found is that there are some corner cases which can produce some "anomaly" results.

Test Case

I'll start with a specially crafted schema and table names as well as data:
create user uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu identified by "u";

User created.

SQL> grant resource to uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu;

Grant succeeded.

SQL> create table uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt
  2  (
  3     n number primary key
  4  );

Table created.

SQL> alter table uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt
  2 add supplemental log data (primary key) columns;

Table altered.
I've just created a new trail file which is currently 974 bytes in size:
[oracle@gg1 dirdat]$ ls -l aa000000
-rw-rw-rw- 1 oracle oinstall 974 Dec 19 11:50 aa000000
What I'm going to do is insert some data into a table and then measure how much redo as well as trail data were generated:
SQL> set autot traceonly stat
SQL> insert into uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt
      select power(10,10)+(level-1)*power(10,10)
        from dual
        connect by level <= 10000;  2    3    4

10000 rows created.


Statistics
----------------------------------------------------------
        0  recursive calls
      329  db block gets
       31  consistent gets
        0  physical reads
     364612  redo size
      821  bytes sent via SQL*Net to client
      917  bytes received via SQL*Net from client
        3  SQL*Net roundtrips to/from client
        2  sorts (memory)
        0  sorts (disk)
      10000  rows processed

SQL> commit;

Commit complete.
Let's take a look at the trail file size:
[oracle@gg1 dirdat]$ ls -l aa000000
-rw-rw-rw- 1 oracle oinstall 1619890 Dec 19 13:40 aa000000
That is roughly 4.4 times bigger than our redo size and 11 times bigger than Oracle's "conservative" estimate. Your storage provisioning might be in for a surprise.

Trail File

To understand the key factors which resulted in our trail file to be of such outrageous size let's use logdump and take a look inside:
Logdump 11 >open ./dirdat/aa000000
Current LogTrail is /u01/app/oracle/ggs/dirdat/aa000000
Logdump 12 >ghdr on
Logdump 13 >next 2

2010/12/19 13:39:16.632.818 FileHeader           Len   966 RBA 0
Name: *FileHeader*
 3000 01b6 3000 0008 4747 0d0a 544c 0a0d 3100 0002 | 0...0...GG..TL..1...
 0002 3200 0004 2000 0000 3300 0008 02f1 bdfb 2d3e | ..2... ...3.......->
 74f2 3400 0028 0026 7572 693a 6767 313a 7175 6164 | t.4..(.&uri:gg1:quad
 726f 3a63 6f6d 3a3a 7530 313a 6170 703a 6f72 6163 | ro:com::u01:app:orac
 6c65 3a67 6773 3600 0025 0023 2f75 3031 2f61 7070 | le:ggs6..%.#/u01/app
 2f6f 7261 636c 652f 6767 732f 6469 7264 6174 2f61 | /oracle/ggs/dirdat/a
 6130 3030 3030 3037 0000 0101 3800 0004 0000 0000 | a0000007....8.......

___________________________________________________________________
Hdr-Ind    :     E  (x45)     Partition  :     .  (x04)
UndoFlag   :     .  (x00)     BeforeAfter:     A  (x41)
RecLength  :    19  (x0013)   IO Time    : 2010/12/19 13:40:32.000.000
IOType     :     5  (x05)     OrigNode   :   255  (xff)
TransInd   :     .  (x00)     FormatType :     R  (x52)
SyskeyLen  :     0  (x00)     Incomplete :     .  (x00)
AuditRBA   :         45       AuditPos   : 37458516
Continued  :     N  (x00)     RecCount   :     1  (x01)

2010/12/19 13:40:32.000.000 Insert               Len    19 RBA 974
Name: UUUUUUUUUUUUUUUUUUUUUUUUUUUUUU.TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
After  Image:                                             Partition 4   G  b
 0000 000f 0000 000b 3130 3030 3030 3030 3030 30   | ........10000000000
The first record is a standard trail file header and is of little interest to us. We're going to take look at the second record. First of all, as you can see, we've got a fully qualified table name stored in there. We can see the inserted value as well. Now if all that stuff got stored in plain text that could explain it... let's check it out:
[oracle@gg1 dirdat]$ grep --binary-files=text -o -i \
> uuuuuuuuuuuuuuuuuuuuuuuuuuuuuu.tttttttttttttttttttttttttttttt \
> aa000000 \
> | wc -l
10000
Indeed, we've got a fully qualified table name appearing 10 thousand times on our trail file! The longer your schema and/or your table names are the bigger your trail file will be. The main reason Oracle Streams are using object identifiers (as well as Oracle's redo) is to avoid getting into exactly this kind of trouble. This makes GoldenGate configuration easier as it doesn't have to store the mapping information on the target database but you also have to pay the price with every row modification which gets captured.

The story doesn't end there, however. Let's take a look at the inserted data. How do you think that got stored?
[oracle@gg1 dirdat]$ grep --binary-files=text -o -i \
> 0000000000 \
> aa000000 \
> | wc -l
10000
That's right - all numbers got stored in plain text. The fundamental problem here is that trail file does not store data types. Everything just gets converted to strings. This helps dealing with heterogeneous data sources but also makes the storage for certain data types to be very inefficient. Again, you have to pay the price somewhere. If you gzip the above trail file you'll get a whopping 25x compression ratio.

You may ask yourself how column names are being stored? The answer is there are no column names in the trail file, just column positions. That's why GoldenGate requires column order to match between source and target databases, otherwise you'll get your data mapped to a wrong column. Add a lack of data types and you might be in for a very interesting results. If column order doesn't match then you'll have to use defgen to get your data across properly.

In a nutshell, watch out for tables/schemas with long names and lots of numeric data.

5 comments:

  1. Anonymous1:39 AM

    This is really good, thanks Alex for your research. I do note that Oracle 11g GG has an RMTHOST parameter for compression -- have you tried this and do you have any recommendations for TCPBUFSIZE ?

    Kind Regards,

    Will

    ReplyDelete
  2. Page 289 in reference guide (http://download.oracle.com/docs/cd/E18101_01/doc.1111/e17791.pdf) provides a formula for calculating TCPBUFSIZE value but I haven't done any performance tests to see how much real world impact it has. I'd imagine that on networks with big latencies larger buffer should provide more benefits so it depends on the circumstances.

    GG uses zlib for data compression so you should see some good benefits, especially in the corner cases like above.

    ReplyDelete
  3. FYI, for my test case the compression ratio was 22x (you can get compression stats using SEND TCPSTATS for your extract process).

    ReplyDelete
  4. I have a question about Flashback Data Archive tables in Goldengate replication. Are these tables usually excluded in an extract like "TABLEEXCLUDE .SYS_FBA_*"?
    What is the approach for replicating and initial load of Flash Back Data Archive tables?
    Here is my problem. I exported a user schema using Oracle DataPump, imported into destination database. GoldenGate abended saying some tables dont exist on dest. I checked tables and there were about 200 tables that were not exported because they are FBDA tables and DataPump just ignores them. So, I recreated them on source with scripts.
    So, how is this done? Source and destination have their own FBDA and these tables should not be neither recreated with scripts on dest or replicated? Or they have to be replicated with contents?

    ReplyDelete
  5. Anonymous11:42 AM

    I need my local trail files to be generated every minute . can you please hekp me with the settings

    ReplyDelete