Using molecular clocks to tell cancer time

We all know tracking metastatic cancer progression is vital for ongoing monitoring of cancer patients, but how do metastatic cancers arise in the first place? Are they seeded simultaneously throughout the body, or do they arise progressively? How are they related to the original tumour? More importantly, how do different metastases respond to treatment, and how can we track this? Amazingly, new insights from the UK’s LEGACY breast cancer program is shedding light on some of these important questions! 

This study simultaneously examined biopsies of primary and secondary tumours from two breast cancer donors, which revealed that secondary tumours were predominantly seeded from a single cell lineage of the primary tumour. This indicates a remarkable phenomenon of monoclonal seeding for metastatic breast cancers, which also aligns with previous studies showing the same progression in colorectal cancers. 

The good news is that if cancer metastases are clonal, the ability to track their dissemination and dynamics becomes more plausible. As a first step to prove this, scientists in this study used chemical markers that progressively accumulate in DNA, serving as molecular clocks to lineage trace metastatic cancers as they spread to secondary sites. These markers can be used for metastatic cancer monitoring by charting secondary cancer ‘family trees’ for each patient. 

Next, they leveraged an easily available resource: circulating tumour DNA in blood samples from breast cancer patients. The specific molecular clock markers found in tissue samples from primary or secondary tumour biopsies were found to correlate highly with their frequency of occurrence in plasma tumour DNA. 

This is very encouraging, as it opens up a range of possibilities for following cancer progression using regular blood tests. First, metastases formation can be closely monitored, and their evolutionary lineages can be traced to the original tumour using these molecular clocks, which could point to initial treatment options. Secondly, relative pathogenic impacts of secondary tumours can also be assessed by monitoring levels of their lineage-specific molecular markers in the blood. Finally, the responses of the various metastatic tumours to different therapies can be tracked, providing the doctors with valuable real-time information that can be effectively used to optimise personalised medical treatment for each patient.

We are very excited by ongoing genomics research to develop new ways of monitoring cancer evolution, pathogenicity, and treatment outcomes using the relatively simple process of blood biopsies which will deliver better options for patients. SeqOne will continue to closely monitor progress in this rapidly growing field to enable seamless incorporation of these new techniques into our software platform for enhanced clinical outcomes.

Original article: https://www.nature.com/articles/s41467-020-15047-9

Detect fusion genes with SeqOne!

RNAseq analysis is now possible using the SeqOne Platform so that you can include the detection of expressed SNVs, Indels and fusion genes in your analysis.

We are  proud to introduce our easy-to-use and intuitive fusion gene interface that visually indicates:

  • Fusion gene details such as the identified partner,
  • Specific exon and coordinates of the breakpoint.
  • The number of reads containing the fusion gene: to increase the confidence in results. 

A link to the Cosmic database helps you to easily interpret whether the fusion is pathogenic and the interface also reports the protein’s domain, providing you with important information to assist you in choosing the best therapy for each patient.

Alignements multiples dans votre genome browser intégré

La visualisation IGV integrée sur SeqOne gagne en flexibilité, vous permettant désormais d’afficher de multiples alignements dans une même fenêtre.

Newsletter octobre 2020

Un affichage modulable dans IGV

Afin de fluidifier la navigation sur la plateforme SeqOne, et en particulier la visualisation des variants dans le genome viewer intégré, nous avons entrepris de générer un alignement allégé, centré autour des variants de chaque échantillon, à l’issue des pipelines d’analyse GermlineVar, GermlineFamily, SomaVar et SomaDuo.

Représentation schématique de la génération du bam échantillon minimal

Dans la pratique, chaque échantillon dispose désormais de deux fichiers bam associés :

  • le fichier d’alignement brut, disponible au téléchargement depuis l’onglet Files.
  • le nouveau fichier bam dit “minimal”, ou min.bam, généré à la fin de l’analyse bioinformatique et à destination d’IGV.

De plus, le genome viewer intégré gagne en flexibilité, et permet désormais l’alignement de multiples fichiers bams. Ceux-ci peuvent aussi bien être les alignements d’échantillons différents à la même position à des fins de comparaison, ou bien le même échantillon (alignement brut, ou généré par une autre analyse).

Utilisation

A partir de la page variant, une icône d’options sur l’onglet Genome browser vous permet d’accéder au menu de paramétrage des alignements (Tracks settings).

Détail des options disponibles dans le genome browser

L’ensemble des projets et échantillons de votre compte sont alors disponibles à la sélection à partir de menus déroulants.

Enfin, il est possible de sélectionner le bam souhaité parmi ceux générés pour l’échantillon, qu’il s’agisse de son alignement brut (noté sample BAM file) ou du fichier généré par le dernier pipeline lancé sur l’échantillon (noté BAM from latest analysis).

Menu de sélection de l’alignement à ajouter dans la fenêtre Genome browser

Masquez sélectivement certains variants

La fonctionnalité deja vu vous permet de sélectionner des mutations dans votre tableau de variants afin de les masquer.

Newsletter octobre 2020

Nouveauté

L’analyse bioinformatique de données NGS obtenues sur un large panel de gènes ou un exome peut délivrer un nombre important de variants, qu’il convient ensuite de filtrer pour identifier la ou les mutations responsable(s) d’une pathologie.

Si le système de filtres dynamiques de SeqOne permet de réduire cette liste à une sélection de variants pertinents dans le contexte d’une analyse, il peut être utile de mettre de côté ceux d’entre eux n’étant pas retenus pour le patient étudié, afin de ne pas y consacrer davantage de temps lors de l’interprétation.

C’est là l’idée derrière la fonctionnalité Déjà Vu, vous permettant de masquer sélectivement et de manière reversible certains variants lors de votre analyse.

Comment ça marche ?

Déjà vu repose sur la sélection d’un ou plusieurs variants à partir du tableau de variants, via un outil de sélection. 

Dès lors qu’au moins un variant est sélectionné, une icône apparaît dans l’en-ête du tableau, vous donnant la possibilité de masquer la sélection. Celle-ci devient alors grisée.

Utilisation de Déjà Vu pour masquer une sélecion de variants.

Cette action est reversible, par le biais d’une seconde icône vous permettant de faire réapparaître les variants souhaités.

Icônes permettant respectivement d’afficher (gauche) et de masquer (droite) une sélection de variants.

Exemple d’utilisation

Après avoir inspecté une première série de variants, suite à l’application d’un filtre par exemple, certains peuvent être rapidement sélectionnés et masqués. Une fois le filtre levé, ou remplacé par un autre, ces variants apparaitront toujours grisés, de sorte d’éviter de s’y attarder à nouveau.

Si votre sélection de variants s’étend sur plusieurs pages du tableau de variants, seule la sélection visible sur la page en cours sera masquée, afin d’éviter tout risque d’erreur.

Contrairement à l’outil VKB, la sélection de variants annotés “déjà vus” est restreinte à l’analyse en cours, et est entièrement reversible.

Configurez votre tableau de variants

Sélectionnez les colonnes d’annotations souhaitées depuis l’onglet dédié, et enregistrez vos profils de colonnes personnalisés pour chaque type d’analyse.

Newlsetter octobre 2020

Nouveau système de gestion des colonnes

En plus du jeu de colonnes affiché par défaut dans votre tableau de variants, de nombreux éléments d’annotations et informations peuvent y être ajoutés sous la forme de colones supplémentaires.

Il suffit pour cela de dérouler la liste des rubriques disponibles à partir de l’encart Columns, localisé à gauche du tableau.

Menu de configuration des colonnes dans le tableau de variants

Les colonnes ainsi sélectionnées persisteront d’une analyse à l’autre, pour toute la durée de votre session.

Enregistrer un profil de colonnes

La configuration sélectionnée pour votre tableau de variants peut désormais être enregistrée, afin que celle-ci ne soit plus réinitialisée à chaque déconnexion.

Ce nouveau système fonctionne d’une manière analogue aux profils de filtres : après avoir déroulé le menu de sélection des colonnes, coché ou décoché les colonnes souhaitées, le profil correspondant peut être enregistré via le menu déroulant Select profile.

Le profil ainsi créé devient alors disponible dans ce menu, applicable d’un simple clic, et peut être réinitialisé à tout moment :

Appliquer un profil de colonnes enregistré

Des profils de colonnes spécifiques selon le type d’analyse

De la même manière que pour les filtres, le profils de colonnes enregistrés sont propres à un type d’analyse : SomaVar, GermlineVar, GermlineFamily, etc.

Seuls les profils utilisables dans l’analyse en cours sont affichés par défaut,  dans la rubrique Personal profiles. Ceux générés à partir d’un type d’analyse différent, et incompatibles avec l’analyse en cours sont listés sous la rubrique Incompatible profiles, le nom du pipeline compatible s’affichant en survolant le profil avec la souris.

Mobile element insertion (MEI) detection for NGS based clinical diagnostics

A growing number of scientific articles describe the pathogenic role of MEI’s, bringing a renewed focus on their importance in clinical diagnosis. Although NGS makes it possible to capture these types of variants, identifying them remains a challenge requiring complex bioinformatic pipelines. This document describes the characteristics of MEIs’ and challenges to be addressed in their identification. It then outlines a new approach that has been developed by SeqOne, to identify them in clinical routine environments.

MEIs detection can significantly improve clinical diagnostic 

Mobile element insertions are genomic variations that can exert significant influence on the genome and its biological function. They consist of endogenous DNA sequences that can copy and paste themselves in various genomic locations. In doing so they can disrupt important biological mechanisms leading to disease. As more links between MEI’s and pathologies are discovered, they are the subject of an increasing amount of studies. However, the difficulty in detecting them using existing bioinformatic solutions has limited their deployment in clinical routine environments. In consequence, it is likely that their influence and pathogenic associations are underestimated. SeqOne has developed a pipeline designed to detect MEIs and provide usable feedback on the impact of this type of genomic variant on the diagnosis.

MEIs mechanism and detection challenges specificity

Mobile element insertions are genomic structural variations produced through retrotransposition. They are defined as genetic elements that can move using a genetic “copy – paste” mechanism to different genomic locations disrupting genetic function as they do so. This process is controlled by a reverse transcription mechanism involving RNA intermediates (Figure 1). Several types of MEIs exist, including LINE-1 (or L1), SVA, and Alu. Approximately 500,000 Long INterspersed Element-1 (LINE-1 or L1) variants and 1.1 million Alu elements, comprising respectively 17% and 11% of a human genomic sequence [1] have been identified. SINE-VNTR- ALU (SVA) elements are rarer and constitute approximately 0.2% of the human genomic sequences [1].

Initially, MEIs were detected using CGH array, Southern blot, Sanger sequencing, or qPCR. These techniques all have limitations in detecting these types of structural variations [1]. For instance, Sanger sequencing is limited in its ability to detect larger insertions (L1 elements) [1]. Next-Generation Sequencing (NGS) opens new perspectives in detecting this type of variant. However, MEIs detection requires specific bioinformatic pipeline developments. Indeed, as structural variants, they are responsible for larger genomic rearrangements which cause read soft-clipping during the mapping. The other difficulty in identifying MEIs is that they involve the same genomic sequences inserted in different locations on the genome which lead to the mapping of reads in different locations and result in discordant read mapping across the genome [2]. Moreover, the presence of numerous copies in the genome can introduce mapping artifacts and lead to false-positives making it important to implement numerous filtering steps [2] (Figure 1).

Figure 1: Retrotransposition mechanism and NGS detection specificity

MEI impact on patient’s health 

By their ability to be actively copied and pasted in different genomic positions, mobile elements can be inserted into the genome, creating dysregulations that lead to genetic disorders. Up to now, more than 120 pathogenic variants caused by retrotransposon activity have been documented. Among them 76 were caused by Alu, 30 were caused by L1 and 13 by SVA [3]. They were involved in numerous diseases including hemophilia (A & B), breast cancer, cystic fibrosis, and Apert syndrome (Table 1). Hemophilia A (1/5000 male birth) and B (1/30000 birth) are rare X-linked disorders caused by mutations in FVIII and FIX genes [4]. In severe forms, internal deep bleeding can lead to long-term disability, especially on joints, including muscle atrophy, pseudo-tumors, impaired mobility, and chronic pain [4]. Cystic fibrosis is the most common genetic disorder among Caucasian children (prevalence of between 1/8000 and 1/10000 in Europe). It is characterized by the production of thick mucus that causes severe damages in the lung and digestive system that can have fatal issues. It has been found that impairments in CFTR genes are associated with this disease [5]. Apert syndrome is a rare genetic disease characterized by skeletal abnormalities and associated with impairment of the FGFR2 gene [6]. Recently, 37 unique, pathogenic RE insertions were identified in 10 cancer risk genes [1]. Moreover, in a recent study, Rebecca I. et al have analyzed 89 874 clinical exomes and have reported 14 MEIs classified as pathogenic or likely pathogenic according to ACMG [7]. In the same study, it is estimated that MEIs assessment and finding could increase diagnostic yield by 0.15% [7]. Overall it is estimated that MEIs are responsible for disease in 0.04% to 0.1% of individuals with suspicion of genetic disease [7]. All of these studies show that MEIs are involved in numerous heritable pathologies. The following table recapitulates some of them found in the literature (Table 1):

Table 1: Examples of genes in which can be found MEIs  

The SeqOne approach for detecting MEIs

SeqOne developed a new methodology for the detection of MEI’s, that is currently available in our germline pipeline. This pipeline is composed of three main steps containing several filtering and controlling sub-steps.  

  • STEP-I: MEI detection

The aim of this step is to detect all candidate breakpoints of possible MEI and the related sequence consensus. This step includes three substeps : 

  1. Retrieving of soft-clipped reads. The soft clipped sequence needs to have a minimum length of 5pb, a cut off above which we consider them of interest for the further steps. 
  2. Clustering by genomic position. Only soft clipped reads of sufficient quality are taken into account for this step. The quality is calculated based on the quality of each base of the read and the read length. For a cluster to be selected, it needs to be composed of at least 10 good quality soft clipped reads (default value). This step also includes a filter on the maximal number of neighbors breakpoints for a given cluster. This filter is important as the more soft-clipped reads occur near a position, the more background noise can be observed, increasing the difficulty in analyzing the region.
  3. Retrieving of the consensus sequences. The consensus sequences are selected on their length, the number of mismatches, and the read mean quality. Selecting regions that correspond to our quality in this way limits false positives. Moreover, regions with a high number of mismatches are more likely to be false positives. The quality of PolyA tail present in MEIs is not taken into account at this step since it has inherently low-quality scores and can lead to false negatives. At this point, consensus sequences are identified with the following information: chromosome containing the breakpoint, position of the breakpoint, side of the soft-clipped sequence, the allele of reference, coverage at the breakpoint, consensus sequence and quality score. 
  • STEP-II: MEI identification

The aim of this step is to align the retrieved consensus sequences (cs) to a database of transposable elements (Dfam) and return the breakpoints that have the best alignment so that cs are aligned with nhmmer. To select the best alignments the following filters were applied: evalue < 0.01 and alignment score > 30. 

  • STEP-III: MEI annotation 

In this step, several files are taken in order to do the MEI annotation: Dfam database file (.hmm), aligned cs file (.txt), refGene (.bed), refSeq canonical transcript (.tsv) and reference genome file (.fa). It returns a VCF file containing selected and annotated MEIs inside coding regions. It is finally merged with the VCF file containing other types of variants. 

The pipeline detects all previously described MEIs (L1, SVA, and Alu). 

The following diagram depicts the workflow developed by SeqOne: 

Figure 2: SeqOne workflow for the detection of MEI

Our workflow detected four Alu validated controls in gene panels validation data, presented in the following table :

Table 2: Alu validated controls detected with SeqOne pipeline

Conclusion

This document outlines the importance of detecting mobile element insertions (MEIs) and describes a new SeqOne functionality to identify them. This new approach accurately calls several types of MEIs events, LINE-1 (or L1), SVA, and Alu, and preliminary results demonstrate the accuracy assessment of four validated MEIs. A growing number of scientific studies show that MEIs are involved in diseases including hemophilia, breast cancer, and cystic fibrosis. However, due to the technical limitations and necessity of specific bioinformatics pipelines, the involvement of MEIs in pathology is currently underestimated. This new approach, included in our pipelines, enriches our existing detection capabilities to provide a more accurate view of pathogenic variants and improve clinicians’ diagnosis.

References and Credits

We thank the French medical laboratory Cerba for providing some of the control samples mentioned in this article, and for their contribution in improving the performances of AluMEI in the early stages of its development.

1. Qian Y, Mancini-DiNardo D, Judkins T, Cox HC, Brown K, Elias M, et al. Identification of pathogenic retrotransposon insertions in cancer predisposition genes. Cancer Genet. 2017;216–217:159–69.

2. Ewing AD. Transposable element detection from whole genome sequence data. Mob DNA. 2015;6:24.

3. Hancks DC, Kazazian HH. Roles for retrotransposon insertions in human disease. Mob DNA. 2016;7:9.

4. Castaman G, Matino D. Hemophilia A and B: molecular and clinical similarities and differences. Haematologica. 2019;104:1702–9.

5. Mall MA, Hartl D. CFTR: cystic fibrosis and beyond. Eur Respir J. 2014;44:1042–54.

6. Azoury SC, Reddy S, Shukla V, Deng C-X. Fibroblast Growth Factor Receptor 2 (FGFR2) Mutation Related Syndromic Craniosynostosis. Int J Biol Sci. 2017;13:1479–88.

7. Torene RI, Galens K, Liu S, Arvai K, Borroto C, Scuffins J, et al. Mobile element insertion detection in 89,874 clinical exomes. Genet Med Off J Am Coll Med Genet. 2020.