Home » Analysis of the Genealogy Process in Investigative Genetic Genealogy
The genealogy process is typically the most time-consuming part of – and a limiting factor in the success of – investigative genetic genealogy. Our objective is to develop a systematic approach to efficiently perform the genealogy portion of investigative genetic genealogy.
We formulate a two-stage mathematical model of the genealogy process: an ascending stage that attempts to find the most recent common ancestors (MRCAs) between the unknown individual and each investigated match, and a descending stage that searches for a marriage among the descendants of the MRCAs. For any given set of investigated matches (and their genetic distance to the unknown individual), we compute the probability of identifying the unknown individual and the expected amount of work (i.e., size of the final family tree). We also use stochastic dynamic programming to derive a policy that optimally chooses the next action (i.e., which match to investigate, which most recent common ancestor to descend from, or whether to terminate the investigation). We use data from 18 unidentified remains cases (nine solved, nine unsolved) from DNA Doe Project to estimate the model’s parameters and compare the optimal policy to a benchmark policy that ranks matches by their genetic distance to the target and only descends from known MRCAs.
A key focus of our study is to assess the benefit of aggressively descending from a match’s ancestor that is not known for certain to be a MRCA with the target. We also assess the utility of GEDmatch’s auto-cluster tool.
This analysis allows for the prior assessment of the level of difficulty of a case, and proposes strategies that may be able to increase the probability of identifying the unknown individual and decrease the case workload.
The genealogy process is typically the most time-consuming part of – and a limiting factor in the success of – investigative genetic genealogy. Our objective is to develop a systematic approach to efficiently perform the genealogy portion of investigative genetic genealogy.
We formulate a two-stage mathematical model of the genealogy process: an ascending stage that attempts to find the most recent common ancestors (MRCAs) between the unknown individual and each investigated match, and a descending stage that searches for a marriage among the descendants of the MRCAs. For any given set of investigated matches (and their genetic distance to the unknown individual), we compute the probability of identifying the unknown individual and the expected amount of work (i.e., size of the final family tree). We also use stochastic dynamic programming to derive a policy that optimally chooses the next action (i.e., which match to investigate, which most recent common ancestor to descend from, or whether to terminate the investigation). We use data from 18 unidentified remains cases (nine solved, nine unsolved) from DNA Doe Project to estimate the model’s parameters and compare the optimal policy to a benchmark policy that ranks matches by their genetic distance to the target and only descends from known MRCAs.
A key focus of our study is to assess the benefit of aggressively descending from a match’s ancestor that is not known for certain to be a MRCA with the target. We also assess the utility of GEDmatch’s auto-cluster tool.
This analysis allows for the prior assessment of the level of difficulty of a case, and proposes strategies that may be able to increase the probability of identifying the unknown individual and decrease the case workload.
Workshop currently at capacity. A waitlist is available to join on our registration page.
Copyright © 2024 ISHI. All Rights Reserved.