A systematic analysis of mitochondrial aminoacyl tRNA synthetase variants in a rare disease cohort - European Journal of Human Genetics


A systematic analysis of mitochondrial aminoacyl tRNA synthetase variants in a rare disease cohort - European Journal of Human Genetics

To prioritise individuals for further investigation, we revised the ACMG classification criteria by including the phenotype similarity scores that met threshold features as supported by the model evaluations. We manually reviewed the available phenotype data and individuals who were identified to have a high likelihood of mt-aaRS-related disease, based on the variant classification and structured phenotype assessment, were highlighted to their recruiting clinicians through the GPAP. Here, we report the likely diagnostic rate.

The initial queries were performed among the 10,935 individuals visible to all registered users on RD-Connect GPAP. The first analysis filtered for potential compound heterozygous variants with gnomAD frequency <0.01, with filters as described. This search yielded 145 variants in 117 individuals, across 16 genes.

We screened the 145 variants to identify which genes had variants that were possibly pathogenic, using ACMG criteria. One individual had variants in IARS2 (MIM *612801) and YARS2, another in AARS2 and VARS2 (MIM *612802). The rest of the cohort had variants only in one gene. Following this screen, we excluded B/LB and false positive variants in the following genes: FARS2 (MIM *611592) (n = 2), HARS2 (n = 4), PARS2 (MIM *612036) (n = 4), SARS2 (n = 12), TARS2 (MIM *612805) (n = 4) and YARS2 (n = 8). Seven variants were deemed false positives due to mapping errors and poor-quality control parameters; 54 of 145 distinct variants were classified as B/LB by at least one of the tools, Varsome and Franklin. The initial ACMG classification identified LP/P variants in at least one of the variants (presumed compound heterozygous) in the following genes: AARS2, CARS2 (MIM *612800), DARS2 (MIM *610956), EARS2 (MIM *612799), IARS2, LARS2 (MIM *604544), NARS2 (MIM *612803), RARS2 (MIM *611524), VARS2, WARS2 (MIM *604733). Therefore, these 10 genes were included in our study.

We identified 111 distinct variants presumed compound heterozygous in 98 patients in the 10 mt-aaRS genes in our cohort: AARS2 (n = 15), CARS2 (n = 8), DARS2 (n = 3), EARS2 (n = 18), IARS2 (n = 13), LARS2 (n = 6), NARS2 (n = 8), RARS2 (n = 10), VARS2 (n = 24), WARS2 (n = 6). Variants that were considered artefacts due to the sequence alignment being too close (n = 13) were excluded without further analysis. Two variants in AARS2, initially presumed to be in a compound heterozygous state (AARS2 NM_020745.4:c.1649G>C p.(Gly550Ala); and NM_020745.4:c.1621G>A p.(Glu541Lys)) were identified in 36 individuals in our cohort. However, following Integrative Genomics Viewer (IGV) analysis showing these in cis in the 36 individuals, these variants were excluded. One patient with another variant in AARS2 found in combination with these variants was excluded. Additionally, another participant with two presumed compound heterozygous variants was excluded because there were no HPO terms available for them in GPAP, precluding further analysis. Overall, 18/111 variants were excluded.

Further review of the remaining 93 distinct variants presumed compound heterozygous, in 50 patients using IGV analysis, quality control parameters and mapping errors, resulted in the exclusion of 36 variants in 20 individuals. DARS2 NM_018122.5:c.142G>T p.(Val48Phe) was found in two patients, but only one individual was included due to quality control metrics. The segregation analysis of the remaining 58 distinct variants presumed compound heterozygous, resulted in the exclusion of six variants in three individuals. One of the two individuals with the same variants (IARS2 NM_018060.4:c.1488A>T p.(Leu496Phe), NM_018060.4:c.2739T>G p.(Phe913Leu)) was excluded due to lack of segregation, while the others remained in the final cohort with no segregation information available. Therefore, 54 variants were included in our final candidate list for possible mt-aaRS-related diseases in 27 individuals (Fig. 1).

A second analysis, filtering for homozygous variants with gnomAD frequency <0.01, gnomAD homozygous allele count equal to 0, with an internal GPAP frequency <0.02 and with high or moderate impact, identified 14 individuals with 15 homozygous variants. One individual had homozygous variants in both DARS2 and VARS2. After excluding variants not within the 10 mt-aaRS genes of interest, there were 10 variants in 11 individuals remaining as the homozygous candidates for mt-aaRS-related conditions (Fig. 1). Overall, after bioinformatic and manual filtering approaches, 38 individuals were evaluated with 63 distinct variants in the mt-aaRS genes of interest.

The variant curation (Fig. 1) revealed that 11/38 participants carried LP or P variants in one of the 10 mt-aaRS genes. These were either homozygous or presumed compound heterozygous but not confirmed to be in trans (Fig. 2). To further ascertain pathogenicity, phenotypic similarity of the individuals was investigated by establishing a reference phenotype-genotype database for the 10 mt-aaRS genes.

Following a PubMed literature review, clinical data on 234 published individuals, from 87 published articles, were manually curated as HPO terms associated with autosomal recessive disease caused by variants in the 10 mt-aaRS genes of interest (Table 1).

The HPO downloaded dataset (downloaded 26/01/2025) from hpo.jax.org was used to evaluate the known HPO-gene associations for the 10 mt-aaRS genes of interest. There were 336 non-redundant HPO terms associated with the 10 genes in the downloaded dataset and 957 HPO terms seen in the manually curated reference dataset, which is reflected by the differences in the comparison of common HPO terms between the two datasets (Supplementary Fig. 1).

The shared terms between the datasets were evaluated (Fig. 3). There were six terms found in the downloaded dataset and not in the reference dataset: 'Long philtrum', 'Abnormal speech pattern', 'Cleft palate', 'Flexion contracture', 'Neurodegeneration', 'Nonimmune hydrops fetalis'. However, on further investigation, variations of all terms, apart from 'Long philtrum', appeared in the reference database. There were more HPO terms seen five or more times in the reference database and not seen in the downloaded dataset (coded as 'Not shared' in Fig. 3), which suggested that the reference database contained more discriminative terms, which would aid phenotype similarity mapping in this study.

The mean phenotype similarity scores were assessed as previously described [28] and Fig. 4 shows that the scores were consistently >0.2 for published cases across the 10 mt-aaRS genes and only 9 published individuals had a mean phenotype similarity score <0.3. Individuals with a DARS2 diagnosis tended to cluster closely together with higher mean phenotype similarity scores, suggesting less heterogeneous phenotypes in this group with predominant neurological features. This contrasted with individuals with IARS2 variant diagnoses (Fig. 4A) and is likely due to the different clinical presentations, including sideroblastic anaemia, dysmorphic features, MRI and EEG abnormalities, within the group of 13 published individuals with IARS2 (Table 1).

A total of 1520 molecularly diagnosed individuals from the GPAP were included in this study, consisting of 64 with mtDNA diseases, 118 with nuclear-mitochondrial gene diagnoses (including three with mt-aaRS deficiencies) and 1338 with other nuclear gene diagnoses. This dataset was combined with the reference mt-aaRS dataset (n = 234), resulting in a cohort of 1754 individuals. The data were partitioned into a training set (n = 1406; 191 mt-aaRS and 1215 'other' diagnoses) and a test set (n = 348; 46 mt-aaRS and 302 'other' diagnoses). This partitioning preserved the relative rarity of mt-aaRS-related diseases in the testing set while allowing model training with a larger and balanced representation of diagnoses in the training set.

The GLM, when applied to the test set, achieved an accuracy of 85.5% (95% CI: 79.72-91.34%), with a sensitivity of 93.0% and specificity of 81.4%. The positive predictive value (PPV) was 76.7% and the negative predictive value (NPV) was 95.1%. The balanced accuracy was 85.5% and the Kappa value was 0.693, indicating moderate agreement between predictions and true diagnoses (Supplementary Table 1). The GLM demonstrated strong discriminatory power with an AUC of 0.932 (95% CI: 0.891-0.973) in the balanced dataset. The RF model demonstrated superior performance, achieving an accuracy of 94.2% (95% CI: 90.31-97.34%) on the test set. It showed a sensitivity of 90.7% and specificity of 96.5%. The PPV was 88.6%, while the NPV was 95.9%. The balanced accuracy was 91.3% and the Kappa value was 0.811, indicating strong agreement between predicted and actual diagnoses (Supplementary Table 1). The RF model exhibited high discriminatory power, with an AUC of 0.976 (95% CI: 0.955-0.998) in the balanced dataset (Fig. 4B). The mean similarity score was the most influential predictor in distinguishing 'mt-aaRS' from 'other' cases, with a variable importance score of 112.9, the inclusion of HPO count and average IC did refine the model's ability to differentiate positive cases, but they had lower contributions with variable importance scores of 41.2 and 36.4 respectively.

When comparing unbalanced data, the GLM achieved an AUC of 0.924 (95% CI: 0.886-0.963), while the RF model achieved an AUC of 0.953 (95% CI: 0.921-0.984), as seen in Fig. 4B. Both models effectively identified mt-aaRS-related diseases; however, the RF model consistently outperformed the GLM in balanced accuracy and specificity across datasets. Both models, using unbalanced and balanced datasets, showed that the optimal mean phenotype similarity score was 0.363-0.365, using Youden's index to identify the best threshold value (Table 2). Therefore, for the evaluation of the undiagnosed RD-Connect GPAP dataset, we used a mean phenotype similarity score of ≥0.3 to suggest supporting phenotype similarity and ≥0.4 to suggest moderate phenotype similarity.

We explored the use of phenotype similarity scores alongside variant annotation to evaluate undiagnosed individuals within RD-Connect GPAP. In the cohort of 38 undiagnosed individuals who carried rare recessive mt-aaRS gene variants, the following phenotype data were available: the minimum HPO term count was 1 (n = 12) and maximum was 13 HPO terms (n = 1), the mean number of HPO terms per individual was 4.08, with a median of 3. The 38 individuals were evaluated using the revised variant pathogenicity classifications, which incorporated the phenotype similarity evaluations. There were 9 individuals with initial ACMG criteria meeting LP or P criteria in at least one allele, with the other variant classified as VUS or higher. With the addition of the PP4 criteria (PP4 supporting for mean phenotype similarity score ≥0.3 < 0.4 and PP4 moderate for mean phenotype similarity scores ≥0.4), 11 individuals met criteria for further investigation based on variant pathogenicity. Furthermore, there were initially 14 individuals with a VUS in presumed biallelic or homozygous states and with the updated PP4 criteria, the revised variant pathogenicity classifications showed one less individual with presumed biallelic VUS. Note that one individual had a homozygous DARS2 NM_018122.5:c.812G>C p.(Arg271Thr) variant, which was classified as a VUS and a homozygous VARS2 NM_020442.6:c.1010C>T p.(Thr337Ile) variant, which was classified as pathogenic, so overall there were 24 individuals with gene variants for possible or definite further investigation.

The gene variants and HPO data were ranked by Exomiser using the ERN-RND 1837 genes list within GPAP. Of the 24 individuals with suspected mt-aaRSs genetic diagnoses or further investigation required, 17 had variants that ranked within the top 10 gene variants. We found that there were 4/24 (17%) individuals without an Exomiser-ranked gene variant (Fig. 1, Supplementary Table) within the GPAP, which highlights that our careful evaluation of the phenotype and genotype data associations provides added benefit. Overall, the addition of the individual-level phenotype similarity score upgraded the classification of seven variants to LP or P (Fig. 1) in six individuals, with an overall yield of 11/98 individuals (11.2%) with likely diagnoses.

Previous articleNext article

POPULAR CATEGORY

misc

18179

entertainment

20676

corporate

17534

research

10447

wellness

17222

athletics

21633