SETBP1 variants outside the canonical degron of the SETBP1 protein cluster in the SKI domain
In our study, we included 18 unrelated individuals carrying rare heterozygous variants with uncertain functional impact in SETBP1 (NM_015559.2), a gene under constraint against LoF and missense variation [pLoF: o/e = 0.02 (0.01-0.11); missense: o/e = 0.9 (0.84-0.95); gnomAD v.2.1.1] (Fig. 1a and Supplementary Data 1). Variants were identified via diagnostic exome or genome sequencing in various diagnostic genetic laboratories in different countries. In one case, the variant was first identified with direct Sanger sequencing of the SETBP1 gene based on clinical observations, followed by trio-based exome sequencing. We collected the clinical and genotype information from the individuals. None of the 18 individuals met the diagnostic criteria of Schinzel-Giedion Syndrome (SGS). Fifteen individuals carried a de novo SETBP1 variant, while the other two had inherited a variant from an affected parent. One individual harboured a missense variant that was not inherited from the mother; the father was unavailable for testing. Among the 15 individuals with a de novo variant, 14 carried a missense variant and one had an in-frame deletion [c.2885_2887del(CCA) p.(Thr962del)]. Within our cohort, there were multiple cases of recurrent identical de novo variants, including c.2572 G > A p.(Glu858Lys) found in four children and c.2584 G > A p.(Glu862Lys) in two individuals, revealing an independent mutational hotspot located in close proximity to the canonical degron region. None of the SETBP1 variants included in our study were present in the gnomAD v.2.1.1 database. Two individuals also carried variants affecting other known disease genes. In proband 3, who has a de novo c.2561 C > T p.(Ser854Phe) SETBP1 variant, an EHMT1 variant was identified. In proband 1, who has a c.1332 C > G p.(Ser444Arg) SETBP1 variant, an inherited 5q22.31 dup and WWOX variants were identified. However, these additional variants in probands 1 and 3 are unlikely to be pathogenic based on the patients' clinical features, the inheritance model, and the lack of functional impact observed in assays performed in cellular models (unpublished observations by co-author M.S.H. with the EHMT1 and WWOX variants).
An overview of the main clinical features per individual is shown in Supplementary Data 1. Individuals (n = 5) with a variant in close proximity to the degron within the SKI domain (affecting amino acids 862-874, excluding the degron 868-871) showed severe or profound intellectual disability and severe motor impairment with inability to walk without support. Four of these individuals were unable to speak. Three individuals showed spasticity. Focal and tonic-clonic seizures were noted in two of these individuals. Two individuals showed renal abnormalities: one had mild kidney dilatation, and another had medullary cystic kidneys. Shared facial features were present in at least three out of the five individuals, including prominent ears, shallow orbits, midface retrusion and microcephaly (Fig. 1b). Overall, the phenotypes of these five children did not fulfil the original Lehman et al. criteria for SGS. However, based on the severity of the phenotypes and facial features, they appeared more similar to (but less severe than) SGS compared to SETBP1-haploinsufficiency disorder.
Individuals (n = 7) with variants located slightly further away from the degron but still within the SKI domain (amino acids 854-858) showed mild or moderate intellectual disability. Two out of four individuals with a c.2572 G > A p.(Glu858Lys) variant were minimally verbal (see the speech and language section; the relevant data on this from the other two individuals were unavailable). All seven individuals showed motor delay, but the degree was much milder compared to the aforementioned cases. All those aged four years and older were able to walk without support, although one individual walked only limited distances. One individual of 3.5 years did not walk yet. Absence seizures were noted in two of these cases. One individual had asplenia. One individual had a non-progressive heart tumour of unknown origin. These individuals did not show visually recognisable or similar facial features (Fig. 1), nor did they show overlapping facial features with either classical SGS or SETBP1-haploinsufficiency disorder based on observations of clinicians. The individuals with inherited variants located furthest away from the degron and the SKI domain [c.1332 C > G p.(Ser444Arg) and c.1970T > C p.(Val657Ala)] showed mild intellectual disability or a low non-verbal IQ. In both cases, parents carrying the variant were similarly affected. For the remainder of the study, we therefore did not distinguish these inherited variants from the de novo variants outside the degron. All variants outside the degron were considered in functional assays as one group vs those within the degron (causing classical SGS).
Variants located outside the SKI domain (n = 4) were associated with a more variable clinical phenotype. One individual with an in-frame deletion removing a threonine residue c.2885_2887del(CCA) p.(Thr962del) showed a severe phenotype with severe speech delay, inability to walk and tonic-clonic seizures. This individual had bilateral ptosis and had surgery on the right upper eyelid and left strabismus surgery. The proband's facial features appeared similar to those of the individual with the c.2984 C > T p.(Leu957Pro) variant (Fig. 1b). They both showed a round face, blepharophimosis, hypertelorism and a short nose with a bulbous tip, features also often noted in individuals with SETBP1-haploinsufficiency disorder. The latter individual had a less severe neurodevelopmental phenotype. This individual started to walk at 22 months and was able to use sign language at the age of three years.
Next, we sought to use a quantitative method to better understand the phenotypic differences of individuals with SETBP1 variants and to aid diagnosis. We therefore utilised PhenoScore to quantitatively investigate whether individuals included in this study were more similar to individuals with SGS or to individuals with SETBP1-haploinsufficiency disorder. PhenoScore is an artificial intelligence-based phenomics framework that combines state-of-the-art facial recognition technology with analysis of phenotypic data in Human Phenotype Ontology (HPO) terms to quantify phenotypic similarity. We first performed a subgroup analysis to demonstrate that PhenoScore was able to distinguish SGS and SETBP1-haploinsufficiency disorder (phenotypic data and facial photographs of five individuals from each group) (Fig. 1c). We then generated individual predictions for all individuals with a missense variant in SETBP1 outside the degron included in this study using this trained PhenoScore model, to determine whether these were more phenotypically similar to the individuals with SETBP1-haploinsufficiency disorder (score = 0) or with SGS (score = 1). We performed this prediction for facial features and HPO terms separately, and also as a combined prediction (PhenoScore). Intriguingly, when only considering HPO terms alone, the majority of the individuals with variants outside the degron were more similar to those with haploinsufficiency disorder, or did not match with either SGS or haploinsufficiency disorder. However, the facial features of those with a variant directly adjacent to the degron were more similar to SGS, while those further away from the degron were more similar to haploinsufficiency disorder, suggesting a gradient for craniofacial feature formation. Interestingly, none of the tested variants outside the degron showed a PhenoScore similar to SGS; instead, the majority were either classified as no match, while a few were classified as haploinsufficiency disorder (Fig. 1d, Supplementary Fig 1b, c, and Supplementary Data 2). Thus, the clinical presentations of variants outside the degron were distinct from classical SGS and sometimes even from truncating variants, as could be distinguished using quantitative phenotypic features.
Speech/language data were available from seven individuals in this cohort, four of whom carry a variant within the SKI domain close to the degron (probands 3, 6, 8 and 14), while three individuals harbour a variant located far from the degron and outside the SKI domain (probands 1 and 18, and the affected mother of proband 1). In this cohort, speech development during infancy was characterised by limited babbling. A history of early feeding difficulties was also present for two children (probands 3 and 18). Language ability was generally low across the group (n = 7) for all subdomains, including expressive, receptive, written, and social language (Table 1 and Fig. 1e). The youngest children (<9 years of age; probands 3, 6, 8 and 18) present with a severe speech and language impairment. They are minimally verbal, defined as the presence of less than 50 spoken words (Table 1), and augment verbal communication with sign language, gestural communication and digital devices. The speech motor system is impaired across all individuals in the group, with CAS the most common diagnosis (n = 5), followed by mild dysarthria (proband 1 and affected mother) (Table 1). CAS features included hesitancy, groping, inconsistency of production, increased errors with increasing word length, simplified word and syllable structures relative to age, and vowel and prosodic errors. Dysarthria was typically characterised by a slower rate of speech, imprecision of consonants, altered nasal resonance and monotonous speech. The adult participants (proband 1 and affected mother) had a history of poor speech development but are now able to hold appropriate conversations and speak in full sentences with speech that is usually to always intelligible. All individuals in this cohort who performed speech/language assessment are attending (probands 3, 6, 8, and 18), or had attended (probands 1 and 14, and the affected mother of proband 1), speech therapy.
We used an array of computational tools to predict the functional effects of the observed SETBP1 variants. Among the 14 variants observed in 18 individuals, eight were located in the SKI domain while six were outside any known functional domain (Fig. 1a). Using a spatial clustering analysis, we showed that these variants outside the degron significantly clustered in exon 4 of the canonical SETBP1 transcript (NM_015559.2) corresponding to the SKI domain (corrected p-value = 9.99e-09, Bonferroni correction). All of the mutated amino acid sites were highly conserved across species with the exception of the threonine residue at position 962, which was conserved only in mammals (Supplementary Fig. 2a). All observed variants were predicted to be (likely) pathogenic by PolyPhen-2 and/or SIFT, and showed CADD-PHRED scores above 21 (Supplementary Data 3).
We went on to use the MetaDome web tool (v.1.0.1) (Supplementary Fig. 2b, c) and MTR viewer (Supplementary Fig. 2d) to visualise all SETBP1 missense variants in the tolerance landscape of the gene. Variants in the SKI domain are located in the regions of high intolerance, while the remaining variants, including those adjacent to the HCF1-binding site, are located in the less tolerant regions (Supplementary Fig. 2b-d). Of note, SETBP1 does not have a particularly high Z-score (1.1) for missense variants in the gnomAD database (v.2.1.1), indicating that the complete coding region of this gene is not extremely intolerant to missense variation. This observation is consistent with the results from the MetaDome and MTR score analyses, which show that only a few regions of SETBP1 have a high intolerance for missense variants, including the part of the SKI domain in which the majority of the observed variants are located (Supplementary Fig. 2b-d). We observed that 33% of the amino acid residues that were mutated by a germline SETBP1 missense variant were also mutated in somatic cells, particularly in haematopoietic and lymphoid cells, according to the COSMIC database (Catalogue of Somatic Mutations in Cancer v.94), including the recurrent missense variant p.(Glu858Lys). While missense variants within the degron (affecting amino acids 868-871) showed different frequencies in germline and somatic cells, consistent with the previously reported higher functional threshold in somatic cells, all observed missense variants outside the degron showed similar frequencies in germline and somatic cells (Supplementary Fig. 3 and Supplementary Data 4).
We went on to study the functional consequences of a representative selection of variants across SETBP1 on protein abundance and localisation, the SET/PP2A axis and cell proliferation, protein stability and degradation, and transcriptional regulation, using patient fibroblasts, as well as in HEK293T/17 cells transiently transfected with SETBP1 expression constructs. Based on the location and their distance to the canonical degron (Fig. 1a), we included in our assays two missense variants located furthest from the SKI domain [p.(Ser444Arg) and p.(Val657Ala)], three missense variants close to the degron [p.(Glu858Lys), p.(Glu862Lys)], and p.(Leu957Pro)], and one de novo in-frame deletion [p.(Thr962del)]. In addition, we included in our assays four classical SGS variants located within the canonical degron [p.(Asp868Asn), p.(Ser869Asn), p.(Gly870Ser), and p.(Ile871Thr)] for functional comparisons to those outside the degron. In transcriptomics experiments with patient fibroblasts carrying different classes of germline SETBP1 variants, cell lines carrying variants within the degron (classical SGS), and truncating variants (SETBP1-haploinsufficiency disorder) were included, again enabling functional comparisons to those outside the degron.
We first assessed the abundance of endogenous SETBP1 protein in fibroblasts derived from three patients carrying a SETBP1 variant outside the degron. Similar to those from SGS patients, all variants outside the degron showed higher SETBP1 protein levels than healthy controls (Fig. 2a, b) while SETBP1 transcript levels did not differ (Supplementary Fig. 4a). We next assessed the abundance of FLAG-tagged SETBP1 with an expanded array of variants in transfected HEK293T/17 cells, comparing cells with variant expression constructs to those with a wild type construct. Consistent with results from patient fibroblasts, all variants showed higher SETBP1 protein levels than the wild type (Fig. 2c, d) but also higher mRNA levels (Supplementary Fig. 4b).
Increased cell proliferation has been reported in EBV-transformed lymphoblastoid cell lines (LCLs) derived from patients carrying germline classical SGS variants and in leukaemic cells with somatic SETBP1 variants that drive development of myeloid malignancies. We therefore investigated the proliferation of fibroblasts derived from three individuals carrying a germline SETBP1 variant outside the degron [p.(Glu858Lys), p.(Leu957Pro), and p.(Thr962del)] compared to those from sex-matched healthy controls. For two of the three variants, we observed that fibroblasts displayed significantly faster proliferation and shorter doubling time than healthy controls in a time course experiment (Supplementary Fig. 4c-f). Interaction between SETBP1 and SET has been shown to stabilise SET, protecting SET from cleavage by protease, subsequently inhibiting PP2A activity and therefore promoting proliferation in HEK and leukaemia cells. We therefore examined the levels of SET in the fibroblasts. Variants outside the degron led to significantly altered SET levels in a group comparison to controls, with differential effects within groups (Fig. 2a, e and Supplementary Fig. 4g). To determine whether SETBP1 interaction with SET was affected by the patient variants, we performed co-immunoprecipitation assays in HEK293T/17 cells co-transfected with GFP-SET and FLAG-SETBP1 variants. Although we observed more abundant GFP-SET expression with increasing SETBP1 levels when co-expressed with FLAG-SETBP1 variants [p.(Leu957Pro) and p.(Thr962del)] compared to wild type, mutated versions of FLAG-SETBP1, including those that led to faster fibroblast proliferation, retained interaction with GFP-SET similar to wild type (Supplementary Fig. 4g, h).
Unexpectedly, we saw marginal differences from wild type for cell proliferation (Supplementary Fig. 4d-f1A and B) and interaction with SET (Supplementary Fig. 4h, i) in cells carrying a recurrent missense variant [p.(Glu858Lys)] which has also been reported in leukaemia cells in atypical chronic myeloid leukaemia (aCML) patients. Moreover, we did not find any differences in PP2A/PP2A phosphorylation between patient and control fibroblasts (Fig. 2a, f), further suggesting that the aetiology involving identical variants in germline and somatic cells is likely to be cell-type specific. Overall, a subset of germline SETBP1 variants outside the degron leads to the accumulation of SETBP1 protein and promotes cell proliferation via a mechanism that is not driven by alterations in SETBP1/SET/PP2A interaction.
Next, we hypothesised that missense variants outside the degron might alter SETBP1 protein stability independent of mRNA levels. We therefore treated control and patient fibroblasts carrying variants outside the degron with cycloheximide to inhibit translation and examined the SETBP1 protein level. When treated with cycloheximide, two out of three variants [p.(Glu858Lys) and p.(Leu957Pro)] showed reduced degradation when treated with cycloheximide (Fig. 3a), similar to what was previously reported for SGS variants, while p.(Thr962del) showed normal degradation following cycloheximide treatment, similar to controls (Fig. 3a). To assess the impacts on a broader range of variants, we used HEK239T/17 cells transiently expressing YFP-tagged SETBP1 and treated them with cycloheximide to inhibit translation and measured relative fluorescence intensity over 24 h. We found that all classical SGS variants (within the degron) showed increased protein stability, whereas all variants outside the degron displayed similar stability to wild type (Fig. 3b and Supplementary Fig. 5). To evaluate the impact of variants on proteasome-mediated degradation, we treated HEK293T/17 cells expressing YFP-tagged SETBP1 with proteasome inhibitor MG132. Surprisingly, the classical SGS variants did not show impaired proteasome degradation, except for p.(Gly870Ser) (Fig. 3b and Supplementary Fig. 6), unlike previously reported. Moreover, three variants outside the degron [p.(Ser444Arg), p.(Glu858Lys), and p.(Leu957Pro)] demonstrated disrupted proteasome degradation to various extents (Fig. 3b and Supplementary Fig. 6). To assess whether the degradation of SETBP1 variants might be compensated by other protein degradation pathways, such as mTOR-dependent autophagy, we used the autophagy inhibitor BafilomycinA1 to treat HEK293T/17 cells expressing YFP-tagged SETBP1. While the majority of variants outside the degron [p.(Ser444Arg), p.(Val657Ala), p.(Glu858Lys), and p.Leu957Pro)] differed significantly in degradation via autophagy, most of the SGS variants were similar to wild type (Fig. 3b and Supplementary Fig. 7). These results further suggested a mechanism only partially overlapping with classical SGS. It is noteworthy that the direction and extent of degradation of the variant proteins were variable and appeared to depend on the distance of the variant from the degron region. Interestingly, protein stability and degradation of the in-frame deletion p.(Thr962del) were not affected (Fig. 3a, b), suggesting that this variant might operate via a different pathophysiological mechanism, in spite of increased abundance.
In silico modelling of germline variants within the canonical degron that cause classical SGS has suggested effects on the interaction between the degron of β-catenin, which has a similar sequence to the βTrCP1 binding site in the SETBP1 degron, and ubiquitin E3 ligase βTrCP1. To investigate whether the observed differences in protein degradation were due to alterations in SETBP1 ubiquitination, we performed immunoprecipitation of FLAG-SETBP1 and assessed its ubiquitin level. Even though impaired proteasome degradation and autophagy were observed in two classical SGS variants and several variants outside the degron (Fig. 3b), ubiquitination was not significantly reduced in the majority of the tested variants [p.(Gly870Ser), p.(Glu858Lys), p.(Leu957Pro), and p.(Thr962del)] (Fig. 3c, d). Intriguingly, variants furthest from the degron showed significantly lower SETBP1 ubiquitination compared to wild type (Fig. 3c, d), consistent with the degradation assay results (Fig. 3b). Although based on in silico modelling of interaction with ubiquitin E3 ligase βTrCP1, the classical SGS variant p.(Asp868Asn) would be expected to show the strongest disruption in SETBP1 degradation, followed by p.(Gly870Ser), we did not see such a pattern in our proteasome and autophagy inhibition assays, nor in the level of ubiquitination. Taken together, these results suggest that accumulations of SETBP1 protein observed for a subset of variants are caused by variable disruptions in SETBP1 protein degradation via the proteasome and autophagy pathways. Other mechanisms are likely to contribute to higher SETBP1 protein levels in addition to pathways involving ubiquitination.
SETBP1 can bind to genomic DNA via its AT-hooks and function as a regulator of transcription. We went on to assess the effects of SETBP1 variants on the capacity of the protein to bind to AT-rich DNA sequences, using a luciferase reporter system. We generated two luciferase reporters, respectively carrying six repeats of the previously reported consensus AT-rich DNA binding sequences of SETBP1 (5'-AAAATAA-3' or 5'-AAAATAT-3'). The majority of the variants tested could still bind to AT-rich DNA sequences (Fig. 4a and Supplementary Fig. 8a, b). Of note, both variants [p.(Leu957Pro) and p.(Thr962del)] located close to the HCF binding domain (amino acids 991-994) showed significantly reduced AT-rich sequence binding capacity (Fig. 4a).
We next used a mammalian one-hybrid (M1H) assay to further delineate whether SETBP1 can induce transcriptional activity in the proximity of promoter regions without direct DNA binding. Wild-type SETBP1 fused with GAL4 showed significantly increased luciferase activity compared to empty controls and a reporter construct without a GAL4-binding site (Fig. 4b). These results confirmed the capacity of the protein to activate transcription in the vicinity of a promoter region without direct binding to DNA, consistent with its role as a chromatin remodeller. The majority of the variants could activate transcription (Fig. 4b). Interestingly, two SGS variants and two variants close to the degron [p.(Glu862Lys) and p.(Leu957Pro)] showed significantly higher transcriptional activity compared to wild type (Fig. 4b). In contrast, the two variants furthest from the degron [p.(Ser444Arg) and p.(Thr962del)] failed to activate transcription, appearing to be LoF (Fig. 4b).
A previously published chromatin immunoprecipitation sequencing (ChIP-Seq) dataset showed that FOXP2, rare genetic disruptions of which lead to CAS, was one of the 70 putative SETBP1 targets in HEK cells expressing a wild-type SETBP1 construct. We therefore first validated FOXP2 as a novel direct transcriptional target of SETBP1 and demonstrated that it could be activated by SETBP1 at two different sites using a luciferase reporter assay (Fig. 4c). Moreover, most of the variants that we tested led to reduced FOXP2 transcription activation (Fig. 4c). Notably, p.(Thr962del), which lacks only one threonine residue in the encoded SETBP1 protein, resulted in complete loss of function in all of our luciferase reporter assays, highlighting the importance of this residue for SETBP1 transcriptional activity. SETBP1 has been shown to be a largely nuclear protein, and so its potential mislocalization could lead to disruption of its function. However, we found that all SETBP1 variants localized to the nucleus as puncta similar to wild type when assessed in transiently transfected HEK293T/17 cells (Supplementary Fig. 9a, b) and endogenously in patient fibroblasts (Supplementary Fig. 9c). Taken together, these data suggest that pathogenic SETBP1 variants outside the degron reduce AT-rich DNA binding capacity and transcriptional activity of SETBP1 within the nucleus while preserving gross intracellular localization.
To assess whether the observed SETBP1 variants lead to a distinct gene expression signature compared to wild type and patients with SGS and haploinsufficiency disorder, we performed RNA-seq on fibroblasts derived from two individuals carrying germline SETBP1 variants within the degron/SGS, six with variants outside the degron, and three with truncating variants (Fig. 5a). Cells from eight healthy donors were included as controls (Fig. 5a and Supplementary Data 5). Principal component analysis (PCA) revealed that transcriptomic profiles of patient fibroblasts formed separate clusters with those from healthy individuals (Fig. 5b and Supplementary Fig. 10a). Fibroblasts carrying different types of SETBP1 variants formed a continuum, with those outside the degron sitting between within-degron/SGS and truncating variants (Fig. 5a). Thus, variants outside the degron showed partial transcriptomic overlaps with the two clinically distinct conditions from the prior literature. We then performed differential gene expression analysis on these three conditions, comparing each to healthy controls. This analysis identified 403 differentially expressed genes (DEGs) in fibroblasts with variants within the degron, 206 DEGs in those with variants outside the degron, and 452 DEGs in those with truncating variants, after filtering for genes with low expression (p < 0.05, FDR; log fold change ≤ -1 or ≥1) (Fig. 5c). Comparison of DEGs identified in fibroblasts carrying different types of SETBP1 variants revealed a number of DEGs present in all two or more conditions but also those unique to each variant type (Fig. 5d), further suggesting partially overlapping mechanisms. We next performed gene ontology enrichment analyses of the consistent DEGs using an R package topGO to delineate the most relevant biological processes, molecular functions, and cellular components. Functional annotation demonstrated over-representation (p < 0.05, Benjamini-Hochberg FDR) of an array of ontologies, some of which were partially overlapping and others of which were unique to different variant types (Fig. 5e), further supporting different pathophysiological mechanisms among SETBP1-related disorders. Several direct transcriptional targets of SETBP1 (MECOM, RUNX1, HOXA9, HOXA10 and MYB) have shown differential expression in leukaemia cells from aCML patients and in HEK cells overexpressing the p.(Gly870Ser) variant. However, we did not observe differential expression of these genes in our RNA-seq data, further suggesting that the aetiological pathways are likely to be cell-type specific. To identify overlap between gene-expression signatures of the different variant groups, we used rank-rank hypergeometric overlap (RRHO) analysis. RRHO is a threshold-free algorithm that measures the degree of overlap by stepping through two gene lists ranked by the degree of differential expression. These analyses confirmed that DEGs of within-degron variants showed only weak overlap with those of truncating variants. DEGs of outside-degron variants showed moderate overlap with those of truncating variants and only mild overlap with those of within-degron/SGS variants (Fig. 5f and Supplementary Fig. 10b). Taken together, SETBP1 variants outside the degron are associated with transcriptomic profiles that partially overlap with those of classical SGS variants within the degron, and truncating variants. Different variant types are linked to differences in gene ontologies, suggesting that there are not only shared aetiological mechanisms across different SETBP1-related disorders, but also mechanisms that are distinct.
To investigate whether and how SETBP1 variants affect differentiation and properties of human neurons, we differentiated patient fibroblasts carrying different types of variants into neurons using small molecules (Fig. 6a, b). We selected two variants from each group and assessed differentiation capacity. Compared to control fibroblasts, different groups of variants showed different neuronal differentiation capacity (Fig. 6c, e). While within-degron/SGS and outside-degron variants showed variable differentiation capacity, truncating variants consistently showed lower differentiation capacity into Tuj1-positive neurons (Fig. 6c, d). Although the differentiation capacity within groups was variable, neuronal morphologies within groups were more consistent. Induced neurons carrying within-degron variants had more mature morphologies, i.e. bipolar neurons with larger soma and longer neurites (Fig. 6e-g and Supplementary Fig. 11). Neurons with outside-degron variants showed normal soma but longer neurites with more branches (Fig. 6e-g), while those carrying truncating variants had normal soma and a multipolar morphology (Fig. 6e-g). Of note, Sholl profiles of neurons with outside-degron variants were significantly different from those with within-degron variants (p < 0.001) but not with truncating variants (p = 0.996), while neurons with within-degron variants were significantly different from truncating variants (p < 0.001) (all with one-way ANOVA and a post-hoc Tukey's test) (Fig. 6g and Supplementary Fig. 11).
We next performed RNA-seq to assess the global transcriptomic profiles of these cells. Day 0 fibroblasts formed separate clusters from induced neurons (days 10 and 12) (Supplementary Fig. 12), consistent with successful differentiation. Since days 10 and 12 neurons are similar, we processed them as one group ("d10/12 neurons") in downstream analyses. PCA of d10/12 neurons showed that PC1 explained 45% of the variance, mainly driven by variant groups and genotypes (Fig. 7a and Supplementary Fig. 13a). The three variant groups formed largely different clusters. The two controls also formed separate clusters, which could be due to differences in sex. We then performed differential gene expression analysis on these three conditions, comparing each to healthy controls, while regressing out effects caused by sex, batch, and days in vitro. This analysis identified different numbers of significant DEGs for different variant types (within-degron variants: 969; outside-degron variants: 1050; truncating variants: 491; p < 0.05, FDR; log fold change ≤ -1 or ≥1), after filtering for genes with low expression (Fig. 7b). Comparison of significant DEGs identified in induced neurons carrying different types of SETBP1 variants revealed a subset of DEGs present in all, two or more conditions but also some that were unique to each variant type (Fig. 7c), further suggesting partially overlapping mechanisms. RRHO analysis showed the weakest overlap between within-degron and truncating groups, as expected, since these two variant groups showed very different clinical and cellular phenotypes across measures (Supplementary Fig. 13b). We observed moderate overlap in DEGs identified in induced neurons carrying outside-degron and within-degron/SGS variants. There was overall weak overlap in DEGs identified in induced neurons with outside-degron and truncating variants, but the top up- and down-regulated genes were similar (Supplementary Fig. 13b). This overall weak overlap in DEGs could result from the significantly reduced differentiation, i.e. the lower proportion of neurons in the truncating variant condition as reflected by the results of cell counting (Fig. 6d). Using gene ontology enrichment analyses, we uncovered partially overlapping over-represented ontologies (p < 0.05, Benjamini-Hochberg FDR) and also some that were uniquely over-represented in different variant types (Fig. 7d). The significant DEGs of each of the three variant types were significantly enriched for genes associated with intellectual disability (within-degron variants: p = 3.23e-07; outside-degron variants: p = 3.07e-06; truncating variants: p = 1.12e-04; all Fisher's exact test) but not with autism (within-degron variants: p = 0.70; outside-degron variants: p = 0.94; truncating variants: p = 0.17; all Fisher's exact test) in the PanelApp database (intellectual disability v3.2 and autism v.0.22) (Source Data 15). Notably, FOXP2 was among the significant DEGs in induced neurons carrying outside-degron and within-degron/SGS variants but not for those carrying truncating variants, further suggesting that the aetiological pathways are likely to be cell-type and variant-group specific. Taken together, human induced neurons derived from patient cells carrying different SETBP1 variants are associated with partially overlapping transcriptomic profiles with shared and unique ontologies, again consistent with the notion that both common and distinct aetiological mechanisms contribute to different SETBP1-related disorders.