Tuesday, January 20, 2009

Latch waits in 11G: queuing

Following up on my previous post about 11G latch waits using semaphore post-wait with a latch holder...

What I was curious to confirm is whether latch holder posts only the first waiter in the list or all of them?

The results

Here are the results I've got using three sessions:
  1. Session 1: gets RC latch
  2. Session 2: misses the latch, begins to sleep
    semop(753664, 0x7fbfffa348, 1)
  3. Session 3: misses the latch, begins to sleep
    semop(753664, 0x7fbfff5f58, 1)
  4. Session 1: completes, posts session 2
    semctl(753664, 28, SETVAL, 0x1)
  5. Session 2: completes, posts session 3
    semctl(753664, 31, SETVAL, 0x1)
Note that though we used the same semaphore set (753664), sessions 1 and 2 posted different semaphores within that set (28 and 31 respectively) and waiters were wakened up by their respective holders according to the order they begun to wait.

sleeps + spin_gets = misses ?

That's an ideal case, of course, as just awakened process might at least have to compete for the same latch with a previous holder if that holder decided to get the same latch again. If awakened process doesn't manage to get the latch during the spin, it'll have to sleep again (thus breaking the above formula). However, the proximity to the above equation might be useful as one of the indicators whether latch is a potential subject to the "new" algorithm or not (this new wait model should tend to produce less sleeps, hence spins, per miss, otherwise there were not much points about it).
SQL> select misses, sleeps, spin_gets, sleeps+spin_gets
2 from v$latch
3 where name='Result Cache: Latch';

---------- ---------- ---------- ----------------
8208 275 7946 8221
The numbers are pretty close indeed. All this made me a bit more curios and I've decided to take a bit closer look at what happens in the strace then waiter has to sleep more than once in a row.

I've ran two parallel sessions executing a loop which selects from result cache...
for i in 1 .. 1000000
for cur in (select /*+ result_cache */ * from t where n=1)
end loop;
end loop;
...made sure the above formula became distorted further and took a look at strace cutput for one of the sessions which led me to an interesting discovery:
[oracle@ora11gr1a ~]$ strace -e semop,semctl,semtimedop -p 7591
Process 7591 attached - interrupt to quit
semctl(1015808, 30, SETVAL, 0x1) = 0
semctl(1015808, 30, SETVAL, 0x1) = 0
semtimedop(1015808, 0x7fbfff42b8, 1, {0, 10000000}) = 0
semtimedop(1015808, 0x7fbfff42b8, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable)
semop(1015808, 0x7fbfff5f58, 1) = 0
What we see there is traditional time-based sleep using semtimedop syscall. Does that means that Oracle still can choose between algorithms dynamically? The answer, according to extended SQL trace, seems to be no, as I wasn't able to match these semtimedop syscalls with latch waits in SQL trace (so these syscalls were used for something else).

P.S. It looks like 11G has invalidated all that pseudo code you may find around about latch acquisition algorithm Oracle uses.

1 comment:

  1. Alexey Nikulin6:25 a.m.

    Hi, Alex. Very interesting blog.

    Take a look at this link http://blogs.sun.com/chrisg/entry/how_large_should_my_semaphore

    i think it will be interesting for you.
    Considering that information, it is strange that Oracle recommends to put all instance's semaphores in one semaphore set.
    Do you have any thoughts about that?