Every time this data loading process executed, it experienced strange slowdowns on seemingly random points in time. Performance drops were quite substantial, which prompted to do an additional investigation. Upon a closer examination of ASH data, it turned out that all slowdowns were due to latch: row cache objects contention.
Row cache objects latch protects the dictionary cache. The first thing was to figure out whether most of the contention was contributed by a particular row cache objects child latch:
SQL> select latch#, child#, sleepsThe first child certainly doesn't look good when we take the amount of sleeps experienced by it, compared to all the other child latches. Once we have the troublesome child latch identified, we can move on and see which type of dictionary cache it protects:
2 from v$latch_children
3 where name='row cache objects'
4 and sleeps > 0
5 order by sleeps desc;
LATCH# CHILD# SLEEPS
---------- ---------- ----------
270 1 24241645
270 5 523
270 4 52
SQL> select distinct s.kqrstcln latch#,r.cache#,r.parameter name,r.type,r.subordinate#The first child protects dc_rollback_segments. We can confirm it by referencing data in v$rowcache:
from v$rowcache r,x$kqrst s
where r.cache#=s.kqrstcid
order by 1,4,5; 2 3 4
LATCH# CACHE# NAME TYPE SUBORDINATE#
------- ------ --------------------------------- ----------- ------------
1 3 dc_rollback_segments PARENT
2 1 dc_free_extents PARENT
3 4 dc_used_extents PARENT
4 2 dc_segments PARENT
5 0 dc_tablespaces PARENT
6 5 dc_tablespace_quotas PARENT
7 6 dc_files PARENT
8 10 dc_users PARENT
8 7 dc_users SUBORDINATE 0
8 7 dc_users SUBORDINATE 1
8 7 dc_users SUBORDINATE 2
9 8 dc_objects PARENT
9 8 dc_object_grants SUBORDINATE 0
10 17 dc_global_oids PARENT
11 12 dc_constraints PARENT
12 13 dc_sequences PARENT
13 16 dc_histogram_defs PARENT
13 16 dc_histogram_data SUBORDINATE 0
13 16 dc_histogram_data SUBORDINATE 1
14 32 kqlsubheap_object PARENT
15 19 dc_table_scns PARENT
15 19 dc_partition_scns SUBORDINATE 0
16 18 dc_outlines PARENT
17 14 dc_profiles PARENT
18 47 realm cache PARENT
18 47 realm auth SUBORDINATE 0
19 48 Command rule cache PARENT
20 49 Realm Object cache PARENT
20 49 Realm Subordinate Cache SUBORDINATE 0
21 46 Rule Set Cache PARENT
22 34 extensible security user and rol PARENT
23 35 extensible security principal pa PARENT
24 37 extensible security UID to princ PARENT
25 36 extensible security principal na PARENT
26 33 extensible security principal ne PARENT
27 38 XS security class privilege PARENT
28 39 extensible security midtier cach PARENT
29 44 event map PARENT
30 45 format PARENT
31 43 audit collector PARENT
32 15 global database name PARENT
33 20 rule_info PARENT
34 21 rule_or_piece PARENT
34 21 rule_fast_operators SUBORDINATE 0
35 23 dc_qmc_ldap_cache_entries PARENT
36 52 qmc_app_cache_entries PARENT
37 53 qmc_app_cache_entries PARENT
38 27 qmtmrcin_cache_entries PARENT
39 28 qmtmrctn_cache_entries PARENT
40 29 qmtmrcip_cache_entries PARENT
41 30 qmtmrctp_cache_entries PARENT
42 31 qmtmrciq_cache_entries PARENT
43 26 qmtmrctq_cache_entries PARENT
44 9 qmrc_cache_entries PARENT
45 50 qmemod_cache_entries PARENT
46 24 outstanding_alerts PARENT
47 22 dc_awr_control PARENT
48 25 SMO rowcache PARENT
49 40 sch_lj_objs PARENT
50 41 sch_lj_oids PARENT
60 rows selected.
SQL> select parameter, getsThe next step is to see whether latch miss source can give us some more hints regarding the issue:
2 from v$rowcache
3 order by gets desc;
PARAMETER GETS
-------------------------------- ----------
dc_rollback_segments 310995555
dc_tablespaces 76251831
dc_segments 3912096
dc_users 2307601
dc_objects 1460725
dc_users 608659
dc_histogram_defs 250666
global database name 67475
dc_histogram_data 43098
dc_histogram_data 14364
dc_global_oids 14320
outstanding_alerts 2956
dc_profiles 2555
dc_awr_control 1925
dc_object_grants 745
dc_files 532
dc_constraints 201
sch_lj_oids 158
dc_sequences 156
dc_table_scns 20
sch_lj_objs 18
dc_qmc_ldap_cache_entries 0
qmc_app_cache_entries 0
qmc_app_cache_entries 0
qmtmrcin_cache_entries 0
qmtmrctn_cache_entries 0
qmtmrcip_cache_entries 0
qmtmrctp_cache_entries 0
qmtmrciq_cache_entries 0
qmtmrctq_cache_entries 0
qmrc_cache_entries 0
qmemod_cache_entries 0
SMO rowcache 0
dc_users 0
dc_partition_scns 0
dc_users 0
realm auth 0
Realm Subordinate Cache 0
rule_or_piece 0
rule_info 0
audit collector 0
format 0
event map 0
extensible security midtier cach 0
XS security class privilege 0
extensible security principal ne 0
extensible security principal na 0
extensible security UID to princ 0
extensible security principal pa 0
extensible security user and rol 0
Rule Set Cache 0
Realm Object cache 0
Command rule cache 0
realm cache 0
dc_outlines 0
kqlsubheap_object 0
dc_tablespace_quotas 0
dc_used_extents 0
rule_fast_operators 0
dc_free_extents 0
60 rows selected
SQL> select "WHERE", sleep_count, locationNow if you take kqrpre: find obj and plug it into a search on My Oracle Support you'll quickly yield Bug 5749075 High Requests on dc_rollback_segments. Among other things, this note points out at the unusually high number of undo segments being created due to cleanup not able to work properly...
2 from v$latch_misses
3 where parent_name='row cache objects'
4 and sleep_count > 0;
WHERE SLEEP_COUNT LOCATION
------------------- ----------- ------------------------------
kqrpre: find obj 20612167 kqrpre: find obj
kqrpup 7 kqrpup
kqrcmt: while loop 1 kqrcmt: while loop
kqrcmt: clear flag 1 kqrcmt: clear flag
kqreqd 1026837 kqreqd
kqreqd: reget 2602576 kqreqd: reget
6 rows selected
SQL> select count(*) from dba_rollback_segs;...and this seems to be the case. The only difference is that the issue has been observed on 11GR2 and the bug has been filled against the older versions. Though it was still worth checking in case we were seeing a regression. Indeed, after getting rid of that many undo segments by simply recreating the undo tablespace, the issue, thought not completely vanished, manifested itself a lot less making it's impact relatively insignificant to the process throughput.
COUNT(*)
----------
14838
Great post but I dont get the part with :
ReplyDeleteselect distinct s.kqrstcln latch#,r.cache#,r.parameter name,r.type,r.subordinate#
from v$rowcache r,x$kqrst s
where r.cache#=s.kqrstcid
order by 1,4,5;
How do You conclude that its dc_rollback_segs related ?
Regards
Greg
You go by the child latch number. We know that CHILD# 1 is what attracting almost all of the contention. From this query you check what line corresponds to LATCH# 1 (which is a row cache objects child latch number) - that would be dc_rollback_segments.
ReplyDeleteselect latch#, child#, sleeps from v$latch_children where name='row cache objects' and sleeps>0 order by sleeps desc;
ReplyDeleteLATCH# CHILD# SLEEPS
---------- ---------- ----------
270 8 144401
270 9 38280
270 4 6582
270 5 6536
270 7 4476
270 13 1984
270 12 892
270 17 149
270 32 128
270 1 15
270 10 4
select distinct s.kqrstcln latch#,r.cache#,r.parameter name,r.type,r.subordinate# from v$rowcache r,x$kqrst s where r.cache#=s.kqrstcid order by 1,4,5;
LATCH# CACHE# NAME TYPE SUBORDINATE#
---------- ---------- -------------------------------- ----------- ------------
1 3 dc_rollback_segments PARENT
2 1 dc_free_extents PARENT
3 4 dc_used_extents PARENT
4 2 dc_segments PARENT
5 0 dc_tablespaces PARENT
6 5 dc_tablespace_quotas PARENT
7 6 dc_files PARENT
8 10 dc_users PARENT
8 7 dc_users SUBORDINATE 0
8 7 dc_users SUBORDINATE 1
8 7 dc_users SUBORDINATE 2
9 8 dc_objects PARENT
9 8 dc_object_grants SUBORDINATE 0
10 17 dc_global_oids PARENT
11 12 dc_constraints PARENT
in my case, is it 'dc_users' related? Thanks!
Hi Alex
ReplyDeleteWe hit the same problem in our database , we are in 11.1.0.7. We found that 40% of the undo segments are offlined as well , as a part of the symptom:
SQL>select count(*) USOn from dba_rollback_segs where status='ONLINE';
USON
----------
2789
SQL> select count(*) USOff from dba_rollback_segs where status='OFFLINE';
USON
----------
2138
Check the Metalink Note:240746.1 .
It has a workaround of setting _rollback_segment_count hidden parameter.
thanks
---Raj [rajorshisen@yahoo.co.in]
As a follow up there's a patch for this. There's a fixed number of latches - 128 - for rollback segments. There's a patch to increase that number but currently not available for linux.
ReplyDeleteBug 14226599 - Increase dc_rollback_segs hash buckets to reduce 'latch: row cache objects' waits [ID 14226599.8]