After that you'll end up with two (mutually exclusive) random numbers in a range from 1 to 50. What I'm going to ask you do next is pull the disks with corresponding numbers out of your RAID10 array. What are the odds of your entire array going down?
In a classical RAID10 setup where every drive is being mirrored by a single other drive you can calculate the probability in a straightforward fashion. After you've pulled the first drive out (doesn't matter which one) there is only one specific disk out of 49 which you have to pull in order for the entire array to go bust. So your odds of taking the entire array down will be 1/49 or roughly 2%.
Now let's say that instead of a regular RAID10 array you've got a normal redundancy ASM disk group with two failure groups (25 disks each). What are the odds now?
ASM Mirroring
Before we're going to answer the above question we need to realize that ASM does not mirror disks the same way traditional RAID10 does. In fact it doesn't mirror disks at all. It mirrors extents instead. For all you know is that the extents from the disk you've just pulled out won't be mirrored in the same failure group. So that leaves us with 24 disks as safe. But what about the other 25 disks from the other failure group? How much of these disks are unsafe and will result in your normal redundancy disk group going south?
Disk Partnership
When mirroring extents ASM uses a concept called Disk Partnership. Every disk in a normal redundancy disk group has one or more partners which are used to mirror primary extents from that disk. This also means that the loss of any of the partner disks is fatal to the disk group as you'll nuke both the primary extent and it's mirror copy (keep in mind that we're talking about pulling both disks out before the disk group would be able to rebalance). At least now we know what we need next in order to solve the puzzle. We need to find out how many partners each of the disks in our array have.
Disk Partners
Let's say that the first disk we've pulled out was disk number 0. The following query can be used to find all partners for the disk number 0 in a first disk group:
SQL> select p.number_kfdpartner, d.FAILGROUP from x$kfdpartner p, v$asm_disk d where p.disk=0 and p.grp=1 and p.grp=group_number and p.number_kfdpartner=d.disk_number; 2 3 4 5 6 NUMBER_KFDPARTNER FAILGROUP ----------------- ------------------------------ 25 FG2 26 FG2 27 FG2 29 FG2 38 FG2 46 FG2 48 FG2 49 FG2 8 rows selected.Pulling any of the above disks out at the same time with the disk number 0 will be fatal for our normal redundancy disk group. In other words, once we pull the first disk out, there are other 8 disks out of 49 which are unsafe. That will bring our odds up (or down, depending in which outcome you're interested :) to 8/49 or a little bit more than 16%.
You can confirm that every disk has exactly 8 partners by running the following query:
SQL> select min(cnt), max(cnt) from ( select number_kfdpartner disk_number, count(*) cnt from x$kfdpartner where grp=1 group by number_kfdpartner); 2 3 4 5 MIN(CNT) MAX(CNT) ---------- ---------- 8 8
Partner Disk Count
Is there any way to control the number of partner disks which ASM uses for extents mirroring? Turns out that there is. Note that it's a completely non supported operation so you shouldn't be playing with it.
The parameter which controls the maximum number of partner disks is called _asm_partner_target_disk_part. In 11GR2 that parameter has a default value of 8. I didn't have a chance to check it in a previous versions but supposedly it's default value there is 10(1). So at least we know that Oracle itself sometimes changes it between different releases.
All you need to do after changing this parameter is to rebalance the disk group:
SQL> alter system set "_asm_partner_target_disk_part"=2; System altered. SQL> alter diskgroup data rebalance; Diskgroup altered.So here it goes!SQL> select p.number_kfdpartner, d.FAILGROUPfrom x$kfdpartner p, v$asm_disk dwhere p.disk=0and p.grp=1and p.grp=group_numberand p.number_kfdpartner=d.disk_number; 2 3 4 5 6NUMBER_KFDPARTNER FAILGROUP----------------- ------------------------------25 FG238 FG2
References
Oracle Automatic Storage Management: Under-the-Hood & Practical Deployment GuideASM Metadata and Internals
All test were performed with Oracle Grid Infrastructure 11.2.0.1
Thanks for the excellent insight.
ReplyDeleteThanks for post.
ReplyDeleteIt's especially interesting when one ASM good feature like ASM Redundancy is
used with another ASM good feature like ASM Fast Mirror Resync
which actually delays rebalance operation
to give more time for partner disk to failure...
Hi Alex,
ReplyDeleteSo you found out it isn't 10 as per our email :-)
asm is really complex on my eyes...
ReplyDeleteHi, Alex!
ReplyDeleteVery interesting! Thanks