A Proteos Program Retrospective: Establishing Protein Sequencing for Forensic Analysis

A Proteos Program Retrospective: Establishing Protein Sequencing for Forensic Analysis

For the past three years, Signature Science has been working on the IARPA PROTEOS program to demonstrate that protein sequencing can successfully be used for human forensic identification for samples where DNA was either absent or degraded, specifically touch samples. Together with the University of North Texas Health Science Center and The Ohio State University, we have established novel sample preparation methods, analytical pipelines, and statistical frameworks to support future implementation in forensic labs.

 

From the outset, it was clear that touch sample protein extraction methods could not sacrifice any potential DNA quantity or quality compared to traditional methods. To overcome this hurdle, we developed a workflow that allows protein to be recovered following a conventional DNA collection and silica column-based extraction. This approach permits laboratories to follow their validated extraction process and then recover protein from the column matrix that would otherwise be discarded. This protein fraction can then be stored frozen and considered for analysis only if a sample did not produce a DNA profile suitable for comparison.

 

For protein sequencing, we have utilized both targeted and non-targeted protein detection using nanoflow liquid chromatography-high resolution, accurate mass-tandem mass spectrometry. We target a panel of 289 genetically variable peptides (GVPs) with allele frequencies < 80%. These markers have been identified with positive predictive value of more than 95% across more than 250 samples. We have also identified 45 low frequency (< 1%) variants across 52 individuals which are highly discriminating but difficult to validate due to their inherently rare nature. To that end, we apply best practices in MS-based proteogenomics and adapt the latest deep learning predictive models to improve detection confidence.

 

Similar to DNA analysis, our approach to forensic match statistics relies on the calculation of likelihood ratio (LR) estimates; however, LR calculation in the protein arena requires several fundamental shifts. For example, unlike DNA analysis, it is possible and often expected that only one protein allele at a given locus may be detectable. Using our LR tools, we consistently achieve single-source LR values on the order of 100 to 200 for touch samples and over 1 × 105 for bulk skin cell samples, with correlated random match probability values as low as 1 × 10-8 for touch sample on brass shell casings and 1 × 10-11 for bulk skin cell samples.

 

Combining these approaches, we have demonstrated the utility of proteomic analysis with both internally generated and blinded, third-party provided touch samples. These included samples placed on brass shell casings, wood, metal, glass, and environmentally degraded porous and non-porous surfaces. Numerous samples yielded insufficient DNA for comparison but supplied sufficient protein for correct identification against a panel of 52 potential contributors. Proteomics also aided in the analysis of mixed contributor samples, highlighted by one example where a contributor who was not identified in the DNA mixture was identified using protein markers. These results underscore the potential benefit of proteomic analysis for forensic labs in their analysis of trace samples with low quantity or quality DNA.

For the past three years, Signature Science has been working on the IARPA PROTEOS program to demonstrate that protein sequencing can successfully be used for human forensic identification for samples where DNA was either absent or degraded, specifically touch samples. Together with the University of North Texas Health Science Center and The Ohio State University, we have established novel sample preparation methods, analytical pipelines, and statistical frameworks to support future implementation in forensic labs.

 

From the outset, it was clear that touch sample protein extraction methods could not sacrifice any potential DNA quantity or quality compared to traditional methods. To overcome this hurdle, we developed a workflow that allows protein to be recovered following a conventional DNA collection and silica column-based extraction. This approach permits laboratories to follow their validated extraction process and then recover protein from the column matrix that would otherwise be discarded. This protein fraction can then be stored frozen and considered for analysis only if a sample did not produce a DNA profile suitable for comparison.

 

For protein sequencing, we have utilized both targeted and non-targeted protein detection using nanoflow liquid chromatography-high resolution, accurate mass-tandem mass spectrometry. We target a panel of 289 genetically variable peptides (GVPs) with allele frequencies < 80%. These markers have been identified with positive predictive value of more than 95% across more than 250 samples. We have also identified 45 low frequency (< 1%) variants across 52 individuals which are highly discriminating but difficult to validate due to their inherently rare nature. To that end, we apply best practices in MS-based proteogenomics and adapt the latest deep learning predictive models to improve detection confidence.

 

Similar to DNA analysis, our approach to forensic match statistics relies on the calculation of likelihood ratio (LR) estimates; however, LR calculation in the protein arena requires several fundamental shifts. For example, unlike DNA analysis, it is possible and often expected that only one protein allele at a given locus may be detectable. Using our LR tools, we consistently achieve single-source LR values on the order of 100 to 200 for touch samples and over 1 × 105 for bulk skin cell samples, with correlated random match probability values as low as 1 × 10-8 for touch sample on brass shell casings and 1 × 10-11 for bulk skin cell samples.

 

Combining these approaches, we have demonstrated the utility of proteomic analysis with both internally generated and blinded, third-party provided touch samples. These included samples placed on brass shell casings, wood, metal, glass, and environmentally degraded porous and non-porous surfaces. Numerous samples yielded insufficient DNA for comparison but supplied sufficient protein for correct identification against a panel of 52 potential contributors. Proteomics also aided in the analysis of mixed contributor samples, highlighted by one example where a contributor who was not identified in the DNA mixture was identified using protein markers. These results underscore the potential benefit of proteomic analysis for forensic labs in their analysis of trace samples with low quantity or quality DNA.

Workshop currently at capacity. A waitlist is available to join on our registration page.

Brought to you by

Worldwide Association of Women Forensic Experts

Submit Question to a speaker