SeqOne posters at the ESHG; using ML and big data to improve variant interpretation

ESHG 2022 - booth X4-664

SeqOne will be at the ESHG show in Vienna in hall X4 booth 664. We will be presenting four posters hi lighting various projects by SeqOne’s R&D team focussing on the use of Machine Learning and Big Data to better prioritize variants to improve accuracy and efficiency in interpreting genomic data. Stop by our booth to meet the authors and discuss your specific needs.

Automated prioritization of copy number variants with ACMG/ClinGen standards

Poster Presentation No.P18.056.D
Session No. & TitlePV04 – Poster Viewing with Authors (Group D)
Session Date & Time13/06/2022 15:45:00 – 13/06/2022 16:45:00 CEST
LocationPoster Hall X3
Jiri Ruzicka, Kévin Yauy, Nicolas Duforet-Frebourg, Laure Raymond, Mélanie Broutin, Jérôme Audoux, Sacha Beaumeunier, Nicolas Philippe, Denis Bertrand

Background: With the rising adoption of long-read sequencing technologies, previously undetected and numerous CNVs (Copy-number variants) are accessible and their prioritization becomes necessary for the clinical evaluation. ACMG and ClinGen published guidelines for clinical interpretation of such variations, which allow more consistent prioritization of CNVs.

Methods: We present an original implementation of the recommendations of the ACMG/ClinGen framework, adapted to both small and large CNVs. Classifications were processed using dosage map sensitivity, general population frequency, phenotype matching and disease inheritance patterns. The performance of the model was compared with the published ACMG/ClinGen dataset consisting of 114 CNVs evaluated by two independent experts.

Results: Our classification tool achieved 96.7% specificity for pathogenic variant identification, identifying correctly 15 of 23 CNV assessed as pathogenic by the two evaluators. 2 additional CNV could be classified as pathogenic when phenotypes were available. In 84.2% of CNVs, the prediction was the same as the prediction of at least one evaluator. For the 15.8% of predictions in disagreement, no variants classified as benign were predicted pathogenic and vice-versa.

Conclusion: This implementation of ACMG/ClinGen standards provides an automated and confident classification of CNVs which accelerates the clinical interpretation of structural variants.

Clinically-driven, multi-layered, and interpretable machine learning model for assisted variant interpretation

Poster Presentation No.P18.070.B
Session No. & TitlePV02 – Poster Viewing with Authors (Group B)
Session Date & Time12/06/2022 16:00:00 – 17:00:00 CEST
LocationPoster Hall X3
Jiri Ruzicka, Nicolas Duforet-Frebourg, Laure Raymond, Jérôme Audoux, Sacha Beaumeunier, Denis Bertrand, Laurent Mesnard, Nicolas Philippe, Julien Thevenon, Kévin Yauy 

Background: With the great expansion of sequencing technologies and artificial intelligence tools, the demand for interpretable classification of variants rises rapidly and highlights the need for a personalized approach based on the clinical context. Unfortunately, the low interpretability of machine learning black-box models limits their adoption in the community.  

Methods: We created a multi-layered machine learning model called ClassifyML which scores the pathogenicity of genomic variants and prioritizes their importance for the clinical context. ClassifyML gathers multi-level annotations based on ACMG-AMP evidence criteria, disease heritability patterns, and phenotype matching. The model was trained firstly on the ClinVar variant classification dataset, followed by a second training on a cohort of 316 deep-phenotyped patients recruited from a French consortium. 

Results: The model proposes an interpretable output in the form of a continuous importance scale for each criterion, which assists the clinical interpretation of variants. We evaluated our method with a multi-centric cohort consisting of 310 patients. The causing variant was classified as having pathogenic evidence in 291 of 310 cases by the model, with an improvement of the median rank of 39 fold compared to Exomiser (3 against 118).

Conclusion: ClassifyML is an interpretable machine learning model for pathogenicity prediction and variant prioritization. It allows variant classification prediction, patient context integration, and yields human-explainable classifications.

A phenotype-gene based graph for symptoms description harmonization and clinically-driven genomic analysis 

Poster Presentation No.P18.011.C
Session No. & TitlePV03 – Poster Viewing with Authors (Group C)
Session Date & Time13/06/2022 12:45:00 – 13:45:00 CEST
Location:Poster Hall X3
Kévin Yauy, Nicolas Duforet-Frebourg, Jérôme Audoux, Sacha Beaumeunier, Denis Bertrand, Laurent Mesnard, Nicolas Philippe, Julien Thevenon

Background: Identical symptoms observed in patients may heterogeneously be described by physicians, even though relying on the same Human Phenotype Ontology (HPO). Several tools explore the accuracy of generating diagnostic hypotheses based on HPO terms associations and vicinity in the ontology, although bearing common methodological limitations.

Methods: We build a phenotype-gene graph weighted by consensus of associations identified on both structured and free-text databases extracted by ElasticSearch. To manage the diversity of physicians’ descriptions, dimensionality reduction of HPO terms was obtained through Non-Negative Matrix Factorization. Based on this graph, we developed a phenotype-gene matching algorithm called PhenoGenius. We evaluated our approach on a multicentric cohort of 316 patients recruited from a French consortium and 444 patients from literature. 

Results: The graph presents more than 2 million phenotype-gene associations, covering 4,974 genes and 9,687 symptoms, whereas the Monarch database contains nearly 640,000 associations. PhenoGenius performance allows a median diagnostic gene rank of 68 (whereas others algorithms range from 144-355). Reducing 9,687 symptoms into 650 groups leads to the reduction of the diagnostic rank dispersion (reducing the standard deviation of 48%) without compromising the ranking performances. Focusing on 650 groups achieve complete coverage of the medical observations and expanded matchings to every medical observation, gaining 24 diagnostics. 

Conclusion: This work explored a weighted phenotype-gene association graph, dissociated from the HPO developmental-based hierarchy used to describe patients’ phenotypes. PhenoGenius presents an original method that harmonizes and maximizes the usage of clinical symptoms in bioinformatic processes, outperforming currently published approaches. 

Automated identification of a cancer patient treatment: from sequencing to treatment prioritisation

Poster Presentation No.P19.020.C
Session No & TitlePV03 – Poster Viewing with Authors (Group C)
Session Date & Time13/06/2022 12:45:00 – 13:45:00 CEST
LocationPoster Hall X3
Nicolas Soirat, Denis Bertrand, Sacha Beaumeunier, Nicolas Philippe, Dominique Vaur, Sophie Krieger, Anne-Laure Bougé, Laurent Castera

Background/Objectives: The emergence of sequencing allowed the scientific community to gather a tremendous amount of cancer genomic data, characterising biomarkers responsible for tumorigenesis that might indicate potential treatments. The use of short-read sequencing to identify cancer patient treatment is becoming a more common practice in hospitals. To standardise the treatment identification some  prediction frameworks have been developed, but they mostly focus on a single alteration type and very few have been implemented.

Methods: We design a targeted DNA and RNA panel covering 639 cancer genes and 57 fusion genes to obtain a comprehensive patient genomic landscape. We developed a decisional algorithm which prioritises all known variant-therapy associations. Several rules give a score for each association based on more than 20 variant features indicating the variant impact in cancer, the patient indication and similarity of patient variant with variant in therapeutic databases.

Result: We generated a thousand simulated tumours, each containing passenger mutations and a targetable mutation from the Civic database. Our method correctly classifies the targetable mutation in its top predictions (average rank 2.19). Furthermore, on a cohort of 12 patients, we obtain similar results as 2 clinical routine approaches using our fully automated protocol. Currently, we are expanding our validation to a pan-cancer cohort of 500 patients.

Conclusion: We design a complete framework for multiple variant drug association identification in order to make easier therapeutic choices for a clinician. We succeed to integrate it into our variant calling workflow and show good performance of our method to prioritise targetable variants.