Correlated Multiphasic Analysis (CMA) was developed as a way to use the DNA member matches of known relations to explore the less well-documented areas of your family tree.
The process of developing CMA's methods and procedures included preliminary trials using my own family matches, and rather than download every match for each subject in our trials, it was decided to set a practical lower limit of 10cM as a reasonable threshold of reliability — but just how realistic is that value?
- Would a lower cM threshold degrade the reliability and usefulness of CMA, or
- Would a larger data set yield insights not otherwise obtainable? or
- If trade-offs of accuracy vs. the size of the data set are unavoidable, what cM value represents an acceptable "sweet spot?"
A Modest Experiment
To explore this further, I decided to attempt a basic analysis using three subjects whose relationships were assuredly known: My mother, my father, and myself, as we've all tested with AncestryDNA.
Objective
The basic idea of this experiment is to evaluate the degree to which anticipated shared matches differ from actual results.
Background
My parents and I have each tested with Ancestry DNA:
Subject Name | |Matches| | Lower Limit of Matches | Ethnicity | Year Tested |
Mom | 41,952 | 8 cM | England 75%Ireland 9%Scotland 8%Norway 6%Sweden & Denmark 2% | 2020 |
Dad | 94 | 8 cM | Southern India 100% | 2021 |
Me | 19,726 | 6 cM | England 42%Southern India 40%Northern India 10%Scotland 6%Sweden & Denmark 2% | 2012 |
Owing to my father's Subcontinent Indian heritage, it's not entirely surprising that my mother has something approaching 450x the number of matches my father does. Curiously — perhaps as a result of changes in Ancestry's testing or reporting methodologies, my results have a lower limit of 6cM whereas each of my parents have a limit of 8cM.
Hypothesis
Since I'm descended from my parents one would expect that every one of my 19,726 member matches via AncestryDNA should also match one or both of my parents. The extent to which this may or may not be the case should give us some idea as to the reliability of AncestryDNA at various cM thresholds.
Method
• Each subject's matches were downloaded to a .csv file using the DNAGedcom Client.
• The .csv data was then imported into a clean copy of the CMA Master Workbook (v3).
• A three-subject comparison was performed, evaluating the matches shared among my parents and I.
Observations
We began by evaluating the matches shared among the test subjects:
|Intersection of| | Mom | Dad | Me |
Mom | • | 1 | 16,211 |
Dad | 1 | • | 35 |
Me | 16,211 | 35 | • |
To clarify: I'm the single match shared by my parents, which is to say my test kit identifier appears in all three sets of matches. Expressed as a Venn Diagram, we see;
Analysis
What's particularly striking about the diagram is the 3,514 member matches in my profile which don't appear to match either of my parents. Since I share 3,488 cM with Mom and 3,485 cM with Dad, there can be no doubt that every one of my 19,726 member matches should also appear among either mom's 41,952 matches or among my father's 94 matches. Let's dig deeper and see how these "missing matches" break down:
Linkage Shared with Me | Number of "missing matches" |
18 cM | 1 |
17 cM | 1 |
16 cM | 5 |
15 cM | 10 |
14 cM | 11 |
13 cM | 26 |
12 cM | 71 |
11 cM | 136 |
10 cM | 303 |
9 cM | 1,219 |
8 cM | 1,667 |
One of the simplifying factors from my diverse parentage is that my father's DNA matches (with exception of one or two low cM matches) are all members with distinctive Subcontinent Indian names — and so it's fairly obvious that the 3,514 "missing matches" which appear exclusively among my DNA matches would likely be connected through my mother's ancestral lines.
Indeed, the "missing" 18cM match is one shared with a 2nd Cousin on my maternal grandfather's side. Ancestry shows the 17cM match as a 4C1R via a "common ancestor" through the same lines. Spot checking a few of the lower cM "missing matches" shows that some of these matches have "shared matches" with higher cM matches that also match my mother.
Knowing that these "missing" matches should appear among my mother's member matches, let's augment the table with a census of my mother's matches at each corresponding level of linkage:
Linkage Shared
with Me | Number of
"missing matches" | | Mom's Matches
@ this Linkage | | %
Errors | %
Accuracy | Effective
Accuracy |
18 cM | 1 | 692 | 0.14% | 99.86% | 99.71% |
17 cM | 1 | 861 | 0.12% | 99.88% | 99.77% |
16 cM | 5 | 1,180 | 0.42% | 99.58% | 99.15% |
15 cM | 10 | 1,613 | 0.62% | 99.38% | 98.76% |
14 cM | 11 | 2,142 | 0.51% | 99.49% | 98.98% |
13 cM | 26 | 2,589 | 1.00% | 99.00% | 98.00% |
12 cM | 71 | 3,290 | 2.16% | 97.84% | 95.73% |
11 cM | 136 | 4,538 | 3.00% | 97.00% | 94.10% |
10 cM | 303 | 6,227 | 4.87% | 95.13% | 90.50% |
9 cM | 1,219 | 9,499 | 12.83% | 87.17% | 75.98% |
8 cM | 1,667 | 6,412 | 26.00% | 74.00% | 54.76% |
The aggregate data from the table indicates that in the range of 8 to 18 cM, the 3,450 "missing matches" in my test results amount to 8.84% of my mother's 39,043 matches in the same cM range. From the table, we can see that the "% Errors" — the number of my "missing matches" as a share of my mother's total matches at the same cM level — greatly increases below 10 cM. If one were to consider that one of my "missing" 9 cM matches might correspond to a member my mother should share 12 cM with, then the proportions would be greater still.
Further, accepting that it's just as probable that the "errors" in my mother's results also exist in some form in my results, we can estimate that the number of matches missing in both directions — members I match whom my mother doesn't plus matches I should have which I don't — will correspond roughly to the "% Accuracy" value multipled by itself, which is the "Effective Accuracy" value listed in the rightmost column.
The aggregate "Effective Accuracy" for the 8 to 18 cM range of matches is 83.11%.
Conclusions
Based on an isolated survey of the matches I share with my parents (my mother primarily) it's reasonable to be skeptical of the presence — or absence — of member matches below 10cM. However, because the CMA process seeks to correlate the findings of multiple Test Subjects over the entirety of their correspondence, it's remains advisable to cast as wide a net as possible, with the understanding that the presence or absence of matches below 10 cM may significantly affect findings.
Can Discovering New Ancestors solve your genealogy mysteries? Read what our clients say — or download our Intake Form!