The UDF pragma tells the compiler that the PL/SQL unit is a user defined function that is used primarily in SQL statements, which might improve its performance.I though it would be very cool to try it out with my HyperLogLog post I did recently and see if it results in any measurable performance improvement.
Test Table
I'll use the same test table as I did in my original post:
SQL> create table z_hll_test as
2 select dbms_random.string('x', 4)||rpad('x', 500, 'x') n
3 from dual
4 connect by level <= 1000000;
Table created
SQL> alter table z_hll_test cache;
Table altered
Note that I'm explicitly setting the table to cache in order to make in-memory PQ kick in and eliminate disk I/O as a factor from my test case. For each test I made sure that no physical I/O has happened (including temp I/O for native distinct test).
Regular Function
create or replace function num_zeroes(
p_n binary_integer
) return binary_integer deterministic is
l_t binary_integer;
l_b binary_integer;
begin
--assume 32-bit hash value, 10-bits for bucket
if (p_n = 0) then return 22; end if;
l_t := 1;
l_b := 0;
while ( bitand(p_n,l_t) = 0 )
loop
l_t := l_t*2;
l_b := l_b+1;
end loop;
return l_b;
end num_zeroes;
SQL> select
2 case
3 when hll <= 2560 and zeroes > 0 then round(1024*ln(1024*1/zeroes))
4 when hll > 1/30 * power(2,32) then round(-power(2,32) * ln(1 - hll/power(2,32)))
5 else round(hll)
6 end num_distinct
7 from (
8 select 0.72054 * (1048576/(current_sum+zeroes)) hll, zeroes
9 from (
10 select sum(1/power(2, val)) current_sum, (1024-count(*)) zeroes
11 from (
12 select /*+ parallel(z 4) */ mod(ora_hash(n), 1024) bucket,
13 max(num_zeroes(trunc(ora_hash(n)/1024)))+1 val
14 from z_hll_test z
15 group by mod(ora_hash(n), 1024)
16 )
17 )
18 );
NUM_DISTINCT
------------
748175
Executed in 0.889 seconds
Pragma UDF Function
create or replace function num_zeroes(
p_n binary_integer
) return binary_integer deterministic is
pragma udf;
l_t binary_integer;
l_b binary_integer;
begin
--assume 32-bit hash value, 10-bits for bucket
if (p_n = 0) then return 22; end if;
l_t := 1;
l_b := 0;
while ( bitand(p_n,l_t) = 0 )
loop
l_t := l_t*2;
l_b := l_b+1;
end loop;
return l_b;
end num_zeroes;
SQL> select
2 case
3 when hll <= 2560 and zeroes > 0 then round(1024*ln(1024*1/zeroes))
4 when hll > 1/30 * power(2,32) then round(-power(2,32) * ln(1 - hll/power(2,32)))
5 else round(hll)
6 end num_distinct
7 from (
8 select 0.72054 * (1048576/(current_sum+zeroes)) hll, zeroes
9 from (
10 select sum(1/power(2, val)) current_sum, (1024-count(*)) zeroes
11 from (
12 select /*+ parallel(z 4) */ mod(ora_hash(n), 1024) bucket,
13 max(num_zeroes(trunc(ora_hash(n)/1024)))+1 val
14 from z_hll_test z
15 group by mod(ora_hash(n), 1024)
16 )
17 )
18 );
NUM_DISTINCT
------------
748175
Executed in 0.593 seconds
Pragma UDF gives us ~33% performance boost which is not too bad considering we didn't have to do anything else. However, that's not the most interesting part -- let's take a look at the native distinct next.
Native Distinct
SQL> select /*+ parallel(z 4) */ count(distinct n) from z_hll_test z;
COUNT(DISTINCTN)
----------------
753204
Executed in 0.983 seconds
Note that this was an optimal execution!
Summary
Let's summarize results in a table:
| Regular Function | Pragma UDF Function | Native Distinct |
| 0.889 | 0.593 | 0.983 |
As you can see pragma udf actually beats native implementation by a considerable margin which is very impressive given the fact that distict had an optimal execution.
