Monday, January 19, 2009

"LONGHOLD" latch waits on Linux

Take a look at the figure from my previous blog post:
call     count       cpu    elapsed       disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 5001 0.53 53.71 0 0 0 500000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5003 0.53 53.71 0 0 0 500000


Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
latch free 1 53.20 53.20
What's unusual about it is a single latch wait for 53 seconds without consuming any CPU time. This is something different from a usual
get -> miss -> spin -> sleep
latch acquisition algorithm we generally used to.

Sequence of events

To remind you, the sequence of events goes like this:
  1. First session: do select /*+ result_cache */ * from t which places a single big result into Result Cache memory.
  2. First session: do select count(*) from v$result_cache_memory which grabs Result Cache latch for a long amount of time.
  3. Second session: do select /*+ result_cache */ * from t which forces us to wait until Result Cache latch will be freed up by our first session.
Latch statistics

Let's take a look at v$latch right after we execute the above three steps:
SQL> select gets, misses, sleeps
2 from v$latch
3 where name='Result Cache: Latch';

---------- ---------- ----------
53494 1 1
As we can see, the session indeed went into a sleep after a single miss. The first thing which may come into your mind is that I'm running this on a single CPU box where we don't spin if we miss a latch (due to _spin_count=1 being default on a single CPU boxes). This is not the case:
SQL> show parameter cpu_count

------------------------------------ ----------- ------------------------------
cpu_count integer 2
And my DB's _spin_count has a default value of 2000 as well.

Let's look a bit further:
SQL> select location, sleep_count, wtr_slp_count, longhold_count
2 from v$latch_misses
3 where parent_name='Result Cache: Latch'
4 and sleep_count+wtr_slp_count+longhold_count > 0;

------------------------------ ----------- ------------- --------------
Result Cache: Serialization12 0 1 0
Result Cache: Serialization16 1 0 1
Note LONGHOLD_COUNT column equals 1 for Result Cache: Serialization16. That means that we've holded a latch for the entire duration of someone else's sleep which, again, conforms to our observation. But what is the underlying mechanism for this?

Under the hood

Here is an excerpt from our second session's strace output:
getrusage(RUSAGE_SELF, {ru_utime={0, 728889}, ru_stime={1, 317799}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 728889}, ru_stime={1, 317799}, ...}) = 0
semop(98304, 0x7fbfff5f58, 1) = 0 -- enters the sleep here
times({tms_utime=72, tms_stime=131, tms_cutime=0, tms_cstime=0}) = 429496789
getrusage(RUSAGE_SELF, {ru_utime={0, 729889}, ru_stime={1, 317799}, ...}) = 0
And the first session:
mmap(0x2a972e7000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 9, 0) = 0x2a972e7000
mmap(0x2a972f7000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 9, 0) = 0x2a972f7000
semctl(98304, 31, SETVAL, 0x7f00000001) = 0 -- issues upon completion
getrusage(RUSAGE_SELF, {ru_utime={206, 80670}, ru_stime={2, 686591}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={206, 81670}, ru_stime={2, 686591}, ...}) = 0
In other words:
  1. Second session: upon discovering that the latch is busy, it begins to sleep by issuing
    semop(98304, 0x7fbfff5f58, 1) command.
  2. First session: upon completion issues
    semctl(98304, 31, SETVAL, 0x7f00000001) which wakes our second session up.
The first argument is a semid which means that both sessions are operating on the same semaphore. Both semaphore syscalls are not presented if we execute each of the above sessions separately.

Though we do sleep under normal latch acquisition algorithm as well, the sleep is based on time, not the latch holder posting us.

Which algorithm?

The interesting question to answer is how the waiter determines which algorithm to use. It could be (A) a special "longhold" mode get for a latch or (B) the waiter might check for how long the latch was already held. Thus far I tend to think that it is more close to (A) (a special get mode or a separate flag somewhere) because if we execute above two sessions concurrently we always get an enormously unusual number of sleeps even if we place only one single-row result into Result Cache memory (thus making sure there are no significant delays selecting from v$result_cache_memory).

Maybe some of you could share other ideas as well.


  1. Not sure why my pingback didn't appear, but here's a reply post:

  2. Tanel, thanks for investigating this further. Very appreciated!

  3. I've fixed the trackback thing, thanks.

  4. Anonymous7:26 a.m.

    My trackback did not appeared also.
    The waits you discovered are common: