ABSTRACT
Internal exon size in vertebrates occurs over a narrow size range. Experimentally, exons shorter than 50 nucleotides are poorly included in mRNA unless accompanied by strengthened splice sites or accessory sequences that act as splicing enhancers, suggesting steric interference between snRNPs and other splicing factors binding simultaneously to the 3′ and 5′ splice sites of microexons. Despite these problems, very small naturally occurring exons exist. Here we studied the factors and mechanism involved in recognizing a constitutively included six-nucleotide exon from the cardiac troponin T gene. Inclusion of this exon is dependent on an enhancer located downstream of the 5′ splice site. This enhancer contains six copies of the simple sequence GGGGCUG. The enhancer activates heterologous microexons and will work when located either upstream or downstream of the target exon, suggesting an ability to bind factors that bridge splicing units. A single copy of this sequence is sufficient for in vivo exon inclusion and is the binding site for the known bridging mammalian splicing factor 1 (SF1). The enhancer and its bound SF1 act to increase recognition of the upstream exon during exon definition, such that competition of in vitro reactions with RNAs containing the GGGGCUG repeated sequence depress splicing of the upstream intron, assembly of the spliceosome on the 3′ splice site of the exon, and cross-linking of SF1. These results suggest a model in which SF1 bridges the small exon during initial assembly, thereby effectively extending the domain of the exon.
One of the fundamental problems in pre-mRNA splicing is initial splice site recognition. This is especially true for splicing in vertebrate pre-mRNAs with small exons surrounded by considerably larger introns. Observations that vertebrate splice sites are better recognized in pairs, especially exonic pairs (reviewed in references 3, 5, 11, 34, and35), suggested one mechanism by which the splicing machinery detects small exons. In these exon bridging or definition events, concerted recognition of an exon occurs early during spliceosome assembly via interactions between U2 snRNP auxiliary factors (U2AF) bound to the pyrimidine tract of the 3′ splice site and U1 snRNPs bound to the 5′ splice site. Such interaction is thought to be either a direct interaction between the SR dipeptide containing subunits of U2AF (U2AF35) and U1 snRNPs (U1 70K protein) or an indirect contact mediated through SR proteins or other splicing factors bound to exonic enhancer sequences (11, 29, 34). Both U2AF and U1 snRNPs are thought to be general splicing factors used for recognition of all exons, although it has been possible to individually bypass requirements for both in vitro in the presence of saturating amounts of SR proteins (14, 28, 41).
The exonic model for splice site pairing can be contrasted to a more classical model in which splice sites interact across introns. Although such pairing can occur by the SR protein and U2AF-mediated model mentioned above, an alternative model for intron bridging invokes an interaction between U2AF65 and U1 snRNPs mediated by splicing factor 1 (SF1) in mammals or branchpoint binding protein (BBP) in Saccharomyces cerevisiae (1, 2). SF1 is required for initial ATP-dependent complex formation in mammals (25) and contacts the large subunit of U2AF (U2AF65) to promote binding of the latter to the branchpoint (4, 6-8). In yeast, BBP has been genetically shown to interact with proteins that are associated with U1 snRNPs (2). Although most of these U1-associated proteins (20, 23, 27) have not yet been reported to exist in vertebrates, the similarity between SF1 and BBP suggests that they do. Thus, U2AF becomes a pivotal branchpoint/pyrimidine tract binding protein that can communicate with either an upstream 5′ splice site (thereby bridging the intron) via SF1 or SR proteins or a downstream 5′ splice site (thereby bridging the exon) via SR proteins. Because these two interaction modes utilize different subunits of U2AF, both could occur simultaneously.
One prediction of models that pair splice sites across exons is that exon size should be important for recognition. Statistical data suggest optimal recognition of splice sites over a narrow size range (21), implying recognition problems for either small or large internal exons. Artificially shortening an internal exon leads to inefficient recognition (17, 18), presumably due to both deletion of exon accessory sequences and steric hindrance of factors, especially large factors like U1 and U2 snRNPs, binding simultaneously to bordering splice sites. Despite this problem, a number of very small exons exist that are constitutively included in vertebrate mRNAs. To gain insight into the mechanism of recognition of small exons, we have undertaken analysis of the sequences and factors required for recognition of the 6-nucleotide (nt) microexon 17 of the chicken cardiac troponin T gene (cTNT). Recognition of this exon requires a 130-nt sequence element located in the downstream intron (12). This element, termed the cTNT intron splicing enhancer (ISE), contains multiple copies of a short G-rich repeat, GGGGCUG, with the first repeat located immediately adjacent to the 5′ splice site. The cTNT ISE is capable of enhancing the recognition of heterologous small exons in vivo. In addition, the cTNT ISE is unique in that it can facilitate enhanced exon inclusion when located either upstream or downstream of its target exon. This latter property suggested that the ISE might be recognized by a factor capable of interacting with factors bound to both the 5′ and 3′ splice sites.
Here we report identification of one trans-acting factor recognizing the short G-rich repeat and this factor's interaction with the splicing machinery. UV-cross-linking experiments and immunoprecipitation identified a 90-kDa protein binding to the GGGGCUG element as SF1. The involvement of a known bridging protein in microexon recognition may suggest why the ISE functions to stimulate inclusion when positioned before or after target microexons. Using precursor RNAs consisting of an isolated internal exon followed by the ISE (i.e., substrates that contain a single 3′ and 5′ splice site in an exonic polarity), assembly of complex A was stimulated by the presence of the ISE downstream of the 5′ splice site and inhibited by competition with small RNAs containing the enhancer repeated GGGGCUG sequence. The binding of SF1 to the enhancer sequence GGGGCUG may be very similar to the binding of SF1 to the branchpoint sequence YNCURA because replacement of nucleotides surrounding the central CU in the branchpoint with G's, a conversion that would make a branchpoint resemble the cTNT enhancer, have been previously reported to have no effect on SF1 binding (6). These results suggest that the presence of an SF1 binding site downstream of a microexon effectively extends the domain of the microexon during early recognition so as to permit spliceosome assembly.
MATERIALS AND METHODS
Plasmid constructs.The constructs used for the experiment in Table 1 involved addition of the ISE to the intron preceding a heterologous miniexon and were based on the previously reported E3 minigene (12). The basal construct contained chicken skeletal troponin I (sTNI) exon 3 flanked by 111 and 173 nt of introns 2 and 3 (40). Wild-type (WT) or mutant (M1, M2, or M3, as defined in Fig. 3) constructs were created by the insertion of double-stranded oligonucleotides (purchased from Gibco BRL) containing a single ISE repeat sequence (with the additional bases CTAG at the 5′ end for annealing to the SpeI restriction site) into theSpeI site upstream of sTNI exon 3. The sequence of the exon whose inclusion is activated by the ISE is TGAAGAG, and that of its 3′ pyrimidine tract is CCTTTTCTCCCCTTTCTCTTCCTTCCCTTCCTCGCCCATCTACTCTCCCT. All constructs were confirmed by DNA sequencing.
The in vitro constructs (E45.ISE and E45.ISE OPP) used for the experiments in Fig. 2, 5, 7, and 8 have been described previously (12) and are based on cTNT exons and intron sequences. The in vitro substrates containing only the second exon (E5.ISE and E5.ISE OPP) were created by NcoI-XhoI deletion to remove 92 nt including the first exon. The second exon in this construct is 30 nt and has the sequence GTTCACAACCATCTAAGGCAAGATGTCCA, with a 3′ pyrimidine tract of CTTCTTCCCTTCCCTCCTCCCT.
The in vivo construct used in for the experiment displayed in Fig. 6was based on a β-globin construct, DUP33 (17). For cloning purposes, the second exon was removed from the parental construct viaBbsI digestion and replaced with a short polylinker. The polylinker was digested and the second exon was reinserted, creating unique BglII site upstream of the exon and a uniqueClaI site downstream of the exon (construct TC 161). PCR-based mutagenesis was used to further modify this clone and create a StuI site located 9 nt downstream of the 5′ splice site of the second exon (construct TC 312). TC 161 was digested withClaI and the entire cTNT ISE was inserted in both orientations to yield +121 sense and +121 antisense constructs. TC 312 was digested with StuI and a shortened ISE was inserted in both orientations to yield +29 sense and +29 antisense constructs. The second exon in this construct has the sequence GCTGCTGGTGGTCCATGGAGGCCCTGGGCAG, preceded by the 3′ pyrimidine tract TCTCTCGCCTATTGGTCTATTTTCCCACCCTT.
In vitro splicing.Standard splicing conditions consisted of 40% HeLa nuclear extract, 25 mM creatine phosphate, 0.625 mM ATP, 3.0 mM MgCl2, 1.2 mM dithiothreitol, and 1.56% polyethylene glycol 8000. In vitro splicing and splicing competition assays were performed in 25-μl reactions incubated under standard splicing conditions at 30°C for 60 min. In vitro assembly assays were incubated at 30°C under standard splicing conditions for the times indicated prior to the addition of heparin to a final concentration of 2 mg/ml and display on neutral gels (12). UV-cross-linking experiments, performed under standard splicing conditions, were incubated for 5 min at 30°C, after which heparin was added to the reaction at a final concentration of 2.4 mg/ml. Samples were then subjected to 10 min of UV irradiation on ice followed by digestion with RNases A and T1 and sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE). Cross-linking reactions using synthetic RNA oligonucleotides (purchased from Oligos Etc.), phosphorylated with radiolabeled [γ-32P]ATP, were not subjected to RNase treatment.
Extract depletions.Standard HeLa nuclear extract was biochemically modified to create poly(G)-depleted nuclear extract. Extract was dialyzed to a deplete glycerol and raise the KCl concentration to 1.0 M. Dialyzed extract was subsequently passed over column containing poly(G) beads (Sigma catalog no. P 1908) in the same buffer. An approximate equal ratio of beads to extract was used. The flowthrough was collected and dialyzed against Roeder D (16).
Immunoprecipitations.Immunoprecipitations were performed using standard UV-cross-linking reactions. After exposure to UV irradiation, 10 μl of polyclonal SF1 or U2AF65-specific antipeptide antibody was added to the reaction mixture, which was then incubated for 60 min on ice. Immune complexes were collected on Gamma Bind G-Sepharose beads (Pharmacia). Samples were repeatedly washed in NET buffer (0.15 M NaCl, 0.05 M Tris-HCl [pH 7.5], 2 mM EDTA, 0.05% NP-40) after 60 min of constant rocking at 0°C. Immunoprecipitated proteins were visualized by SDS-PAGE followed by autoradiography.
RESULTS
GGGGCUG is the core repeat unit of the cTNT ISE.We previously described (12) the presence of an ISE located downstream of the 6-nt cTNT exon 17 (Fig.1A). The 130-nt ISE is composed of several copies of a short G-rich repeat with a consensus of GGGGCUG (Fig. 1A and B). Multiple copies of this repeat are capable of enhancing the recognition and inclusion of a heterologous 7-nt microexon, sTNI exon 3. To further define the minimal sequence required to effect exon recognition, a single GGGGCUG sequence containing either the WT or mutant sequence were tested for the ability to facilitate exon inclusion.
Structure of the cTNT miniexon 17 and surrounding intron sequences including the G-rich ISE. (A) Exon-intron structure of the region of the cTNT gene including the 6-nt exon 17. (B) Sequence of the ISE. Exon 17 sequences are indicated within the box, the enhancer is indicated in capitals, and each G-rich repeat is underlined. (C) Comparison of the individual repeats within the ISE with the derived consensus.
Using a test system developed in a previous study (12), the 7-nt GGGGCUG sequence was placed upstream of a heterologous miniexon (sTNI exon 3) in a minigene in which exon inclusion could be assayed following transient transfection. In the absence of an added enhancer, only 13% of the mRNA included exon 7 (Table1). The addition of a single copy of the G-rich repeat GGGGCUG (WT) enhanced the level of exon inclusion to around 50%, indicating that a single copy of GGGGCUG has enhancing capacity. Mutation of the repeat sequence to AAAACUG (M1), GGGGAAA (M2), or GUGUCAG (M3) destroyed enhanced levels of exon recognition and inclusion, producing levels of exon inclusion of 13, 11, and 14%, respectively. These data suggest that GGGGCUG is a specific splicing enhancer sequence and that an entire GGGGCUG sequence is necessary. The sequence GGGGGGG supported slightly improved exon inclusion compared to the other mutant elements (23% inclusion), suggesting that ISE-binding factors might demonstrate a binding preference for oligo(G).
A single copy of the WT but not mutant repeat supports in vivo miniexon inclusion
An oligonucleotide containing the ISE sequence GGGGCUG competes the splicing enhancement afforded by the cTNT ISE.We have previously reported that the ISE enhances in vitro splicing of heterologous test substrates (12). To ascertain if the ISE GGGGCUG sequence is the binding site for factors involved in ISE-mediated enhancement, we performed competition experiments to test whether WT or mutant copies of GGGGCUG could inhibit in vitro splicing of precursor RNAs containing an ISE (Fig.2). The in vitro splicing test substrates consisted of cTNT exon 4, intron 4, and the relatively small exon 5 (30 nt) with the cTNT ISE located downstream of exon 5 in either the sense (E45.ISE) or antisense (E45.ISE OPP) orientation. The presence of the enhancer stimulated splicing of this precursor RNA (Fig. 2B, compare lanes 1 and 2).
Competitor RNAs containing a WT GGGGCUG repeat sequence but not mutant sequence inhibit in vitro splicing using intact HeLa or poly(G)-depleted nuclear extracts. Increasing amounts of competitor RNAs as indicated were added to splicing reactions with a test precursor RNA using standard HeLa (A) or poly(G)-depleted nuclear extract (B). Depleted extract was passed over poly(G) at high salt; the unbound fraction was dialyzed against Roeder D buffer prior to assay. The test RNA substrate consisted of a weak two-exon precursor RNA dependent on the enhancer for maximal activity (12). The cTNT ISE was positioned downstream of the second exon to assay the effect of the enhancer sequence on splicing of intron 1. Competitor RNAs were added to 25-μl reactions at 200, 400, 600, and 800 pmol. Competitor sequences were GGGGCUG (WT), AAAACUG (M1), and GGGGAAA (M2). (B) In vitro splicing using poly(G)-depleted extract. Lanes 1 and 2 show splicing activity with precursors containing the WT ISE in either sense (E45.ISE) or antisense (E45.ISE.OPP) orientation to demonstrate that element-enhanced splicing occurs in poly(G)-depleted extract. Splicing reactions were performed for 60 min. RNAs were displayed on a 5% denaturing polyacrylamide gel. Precursor and product RNAs are indicated.
Radiolabeled in vitro-transcribed RNA substrates were incubated under standard splicing conditions in the presence of increasing amounts of RNA oligonucleotides corresponding to wild-type or mutant versions of the short G-rich repeat located in the ISE. As shown in Fig. 2A (lanes 1 to 4), the addition of increasing amounts of the specific RNA competitor GGGGCUG (WT RNA oligonucleotide) resulted in a reduction in the amount of spliced product RNA. Competition reduced the observed level of splicing down to the level observed when substrate lacking the enhancer sequence was used, suggesting that competition of the binding of factors to the enhancer was occurring. In contrast, the nonspecific competitor GGGGAAA (M1 RNA oligonucleotide) or AAAACUG (M2 RNA oligonucleotide) had little or no effect on the level of splicing (Fig. 2A, lanes 5 to 12). Thus, an RNA oligonucleotide consisting of a single G-rich repeat from the cTNT ISE is capable of competing the splicing enhancement afforded by the entire cTNT ISE. In addition, the inability of RNA oligonucleotides M1 and M2 to compete splicing activity supports the in vivo results suggesting that a functional enhancer unit requires both GGGG and CUG.
GGGGCUG binds to p90, a 90-kDa protein in nuclear extract.To determine what factor(s) binds to the G-rich repeat, UV-cross-linking experiments were performed. Radiolabeled RNA oligonucleotides containing the WT or mutant enhancer GGGGCUG sequence were incubated under splicing conditions. A number of proteins cross-linked to the WT sequence (Fig.3A, lanes 1 to 4). A protein migrating at 90 kDa (p90) cross-linked to the WT oligonucleotide, not at all to mutant oligonucleotide AAAACUG or GUGUCAG, and very weakly to mutant oligonucleotide GGGGAAA. Two larger proteins with apparent molecular masses of 110 and 150 kDa also cross-linked to the WT oligonucleotide but not to AAAACUG or GUGUCAG. Both of these proteins, however, cross-linked at WT levels to the mutant GGGGAAA, suggesting less specificity for the WT GGGGCUG sequence than p90. In addition to these proteins, both hnRNP F and H (as determined by immunoprecipitation [data not shown]) strongly cross-linked to any of the oligonucleotides with high G content, including the mutant GGGGAAA.
The ISE GGGGCUG repeated sequence can be UV cross-linked to a 90-kDa protein in both HeLa and poly(G)-depleted extract. (A) Radiolabeled RNA oligonucleotides consisting of WT or mutant ISE GGGGCUG sequence were UV cross-linked using standard splicing conditions in HeLa (lanes 1 to 4) or poly(G)-depleted (lanes 5 to 8) nuclear extracts. Cross-linked reactions were not treated with RNase prior to electrophoresis; therefore, the substrate oligonucleotides are still attached to the proteins and affect apparent molecular weights. Cross-linked bands of interest are indicated. (B) Competition of UV cross-linking of the 90-kDa protein by the WT ISE GGGGCUG sequence but not mutant sequences. Proteins that bind to the ISE GGGGCUG sequence were detected by UV cross-linking in the presence of increasing amounts of WT or mutant competitor RNAs using poly(G)-depleted nuclear extract. The cross-linking substrate (WT) consisted of a short radiolabeled RNA oligonucleotide, GGGGCUG. The competitors used were GGGGCUG (WT), AAAACUG (M1), and GGGGAAA (M2). The amount of added competitor RNA is indicated.
The ability of hnRNP F and H to strongly cross-link to mutant versions of the ISE GGGGCUG sequence suggested that their binding might not be pivotal to enhancer function. We therefore decided to adopt a purification strategy for ISE-binding proteins that would fractionate interesting proteins such as p90 away from hnRNP F and H. We chose chromatography on poly(G) as an initial step in this purification because of the strong affinity of hnRNP F and H for this resin. HeLa nuclear extract was passed over poly(G) at high salt (1.0 M KCl), conditions that should bind hnRNP F and H (30, 39) but which might not bind other proteins with lower affinity for G nucleotides. UV cross-linking with the flowthrough fraction from such chromatography [referred to here as the poly(G)-depleted extract] revealed that the hnRNP F and H concentrations were substantially reduced by this procedure but that p90 remained, as did the 110- and 150-kDa proteins. Furthermore, p90 retained its cross-linking preference for the WT ISE sequence, suggesting that fractionation had not removed specificity (Fig. 3A, lanes 5 to 8).
Splicing assays performed with the poly(G)-depleted extract were used to ascertain if the depleted extract maintained ISE-dependent splicing enhancement (Fig. 2B). The splicing of the test substrate was still ISE dependent in the depleted extract, indicating that factors necessary for the ISE effect remained following depletion. The ISE-mediated enhancement in the depleted extract was competed by WT oligonucleotides but not the AAAACUG mutant. A competitor with the sequence GGGGAAA was capable of competing this activity but to a lesser extent than the WT oligonucleotide. Competitions were more effective in the depleted extract, suggesting removal of proteins that bound the sequence but which were not essential for enhancer activity (compare Fig. 2A and B).
GGGGCUG competes UV cross-linking to p90.The preceding experiments suggested that p90 was an interesting candidate for an enhancer-binding protein because the protein failed to cross-link to a mutant ISE sequence and survived poly(G) depletion. Oligonucleotide competition experiments were used to demonstrate that the failure of p90 to UV cross-link to mutant ISE sequences was caused by a failure to bind rather than an inability to cross-link due to a sequence change in the mutant substrates (Fig. 3B). The radiolabeled substrate used in these reactions, the WT RNA oligonucleotide, was incubated in poly(G)-depleted HeLa nuclear extract in the presence of increasing amounts of unlabeled competitor RNAs. Self-competition resulted in a decrease in radiolabeled p90 as well as a decrease in the 150-kDa species but no decrease in p110 (Fig. 3B, lanes 1 to 5). This experiment indicated that p90 and p150 might be interesting enhancer-binding proteins to study, but not p110 (indeed, p110 was sequenced and found to be nucleolin [data not shown]). The AAAACAG mutant RNA oligonucleotide competitor had little to no effect on the pattern of cross-linked proteins (Fig. 3B, lanes 6 to 9). Similar to the WT RNA oligonucleotide, addition of the M2 RNA oligonucleotide (GGGGAAA) resulted in decreased levels of cross-linked p90 and the 150-kDa species (Fig. 3B, lanes 10 to 13). This experiment also demonstrated a highly reproducible distinction between p90 and the 150-kDa species—p90 appeared to preferentially bind to GGGGCUG, while the 150-kDa protein preferentially bound GGGGAAA as assayed by competition experiments (Fig. 3, compare lanes 2 to 5 to lanes 10 to 13). This result concentrated our attention on p90 as the protein of immediate interest.
p90 is SF1.Fractionation of nuclear extract aimed at biochemical isolation of p90 demonstrated that the cross-linked protein had several characteristics in common with mammalian SF1. Most significantly, SF1 and p90 both demonstrated affinity for poly(G) and did not bind to DEAE at low to moderate salt concentrations (reference25 and data not shown). Both factors were present in S100 as well as nuclear extract (Fig. 4). Although p90 is slightly larger than SF1 (75 kDa), in our tests of cross-linking to oligonucleotides, no RNase treatment was used so that the observed species still contained a full-length oligonucleotide covalently coupled to the protein. Therefore, the cross-linked species could well be SF1 despite its slightly greater size. The similarities in properties of p90 and SF1 suggested that p90 and SF1 could be related.
SF1-specific antibodies immunoprecipitate the 90-kDa UV-cross-linked protein. Radiolabeled GGGGCUG (WT) and AAAACUG (M1) RNA oligonucleotides were UV cross-linked to proteins in HeLa or S100 extracts. Following cross-linking, the reactions were subjected to immunoprecipitation using antisera specific for SF1 or U2AF proteins as indicated. Lanes 8 and 9 contain 2 and 4 μl of the cross-linked reaction using nuclear extract without immunoprecipitation. The positions of SF1 and p90 are indicated.
To test this hypothesis, antiserum specific for SF1 was used to immunoprecipitate UV-cross-linked proteins. The SF1-specific antipeptide antibody H3 (33) was capable of precipitating p90 cross-linked to the WT RNA oligonucleotide in both nuclear and S100 extracts (Fig. 4, lanes 2 and 6). In contrast, beads alone (Fig. 4, lanes 1 and 5) or polyclonal antiserum specific for the 65-kDa subunit of U2AF (Fig. 4, lane 4) did not immunoprecipitate p90. More importantly, the anti-SF1 antibody did not immunoprecipitate proteins cross-linked to a control mutant RNA (Fig. 4, lanes 3 and 7). Reactivity with the anti-SF1 antibody data demonstrated that p90 is the previously identified splicing factor SF1.
The cTNT ISE enhances initial spliceosome formation.SF1 has been implicated in the earliest steps of spliceosome formation (2, 7, 8, 25). If SF1 binds the ISE GGGGCUG sequences, then the enhancer should enhance early spliceosome assembly. To attempt to determine if the ISE affects these steps in vitro, spliceosome assembly assays were performed. In these experiments, substrates were tested for the ability to form ATP-dependent splicing complexes in the presence or absence of the cTNT ISE. Two single-intron constructs, E45.ISE and E45.ISE OPP, were used. In these constructs, the ISE is positioned downstream of exon 2 within the incomplete intron 2. The assembly being assayed is that of intron 1; therefore, in this precursor RNA the enhancer is affecting exon definition of exon 2 and subsequent assembly of intron 1. The presence of the ISE in the sense orientation, E45.ISE, facilitated increased levels of complex A formation compared to the substrate containing the ISE in the antisense orientation, E45.ISE OPP (Fig. 5A). The E45.ISE construct also had enhanced levels of complex B formation presumably due to the increased levels of complex A. This finding indicates that ISE-binding proteins binding to a sequence downstream of the second exon promote recognition of the upstream intron, suggesting that the ISE is involved in defining the second exon and implicating SF1 in exon definition of an upstream exon.
The ISE stimulates assembly of complex A. (A) Two-exon, single-intron substrates with the ISE positioned downstream of the second exon in the sense (E45.ISE) or antisense (E45.ISE.OPP) orientation were incubated under standard splicing conditions for the times indicated, and spliceosome complexes were visualized via native gel electrophoresis. Splicing complexes H, A, and B are indicated. (B) Assembly of substrates consisting of a single internal exon followed by the ISE in the sense (E5.ISE) or antisense (E5.ISE.OPP) orientation. Substrates are diagrammed below the gels. Substrates are identical or derived from those in Fig. 2.
To further test the role of the ISE in exon definition, assembly substrates were created that contained only the single internal exon, exon 5, followed by the ISE in either the sense or antisense orientation (E5.ISE and E5.ISE OPP). Such isolated exon precursor RNAs are capable of forming only complex A (Fig. 5B). The construct containing the ISE in the sense orientation (E5.ISE) assembled significantly better than the substrate containing the ISE in the antisense orientation (E5.ISE.OPP). The increased level of complex A formation in the presence of the ISE suggested that SF1 binding to the cTNT ISE promotes early steps in exon recognition.
The cTNT ISE is a distance-dependent element.The observations in Fig. 5 indicated that the ISE participates in exon definition when downstream of a target exon. In its parent gene, the ISE is positioned immediately downstream of the exon beginning one nucleotide beyond the 5′ splice site. This positioning coupled with the observation that the element is recognized by a protein rather than an snRNP suggested that the element might need to be close to the exon it affects. To test this hypothesis, we created minigenes for in vivo transfections in which the ISE was positioned at different distances downstream of a target small exon.
The gene used in this experiment was the artificially created DUP33 construct based on the β-globin gene that only minimally includes the middle 33-nt exon unless supplemented with improved splice site sequences or enhancers (17, 18). The basal construct was modified to add the ISE, in both sense and antisense orientations, either 29 or 121 nt downstream of the 5′ slice site of exon 2. Reverse transcription-PCR (RT-PCR) analysis of in vivo RNA splicing phenotypes following transfection of HeLa cells demonstrated that the construct containing the ISE in a sense but not antisense orientation directed considerable inclusion of the middle exon when it was placed 20 nt downstream of the 5′ splice site (Fig. 6, lanes 2 and 3). Even a modest increase in the distance between the 5′ splice site and enhancer to 121 nt reduced splicing enhancement compared to the control (Fig. 6, lanes 4 and 5). These data demonstrate that the cTNT ISE is a distance-dependent element. Such a distance requirement suggests that factors binding to the enhancer, including SF1, must be placed near the exon to have effect.
The ISE functions to activate an upstream exon only when positioned close to the 5′ splice site. The ISE was positioned either 29 or 121 nt downstream of the second exon in the three exon β-globin substrate DUP33 (17). Both sense and antisense versions of the enhancer were used. The constructs were transfected into HeLa cells, and RNA splicing phenotypes were assessed using RT-PCR and primers specific for exons 1 and 3. Species resulting from inclusion or skipping of the middle exon are indicated.
DISCUSSION
Studies directed at understanding the recognition of small vertebrate exons have led to the consistent observation that intronic enhancers located adjacent to the exons are necessary for efficient exon recognition and splicing (9, 10, 12, 18, 32, 36, 40, 44, 47). In this study we have identified a core repeat unit within such an enhancer and a protein recognizing the core sequence. The intron enhancer under study (ISE) is located immediately adjacent to the 5′ splice site of a 6-nt microexon from the cTNT gene. The ISE is required for both splicing and initial recognition of the microexon. Within the 130-nt ISE are six copies of the short G-rich repeat GGGGCUG. RNA oligonucleotides containing this short sequence were able to compete the splicing enhancement afforded by the entire 130-nt ISE in in vitro splicing and spliceosome assembly assays.
UV-cross-linking experiments demonstrated that the mammalian splicing factor SF1 binds to the ISE GGGGCUG repeated sequence. Consistent with SF1 binding to the ISE, the presence of the ISE enhanced early steps of spliceosome assembly, as assayed by complex A formation. Oligonucleotides containing the ISE GGGGCUG sequence effectively competed splicing, complex A assembly, and SF1 cross-linking. These observations suggest that SF1 binding to the ISE GGGGCUG sequences participates in exon definition of the microexon. Mutant ISE GGGGCUG sequences that failed to bind SF1 and failed to compete ISE-dependent in vitro splicing also failed to stimulate in vivo recognition of the ISE, suggesting that the binding of SF1 to the enhancer is required for in vivo activity. Because we were unable to use available reagents to effectively deplete endogenous SF1, however, formal proof of the requirement for SF1 in this splicing reaction must await further experimentation.
SF1 has recently been shown to bind to the branchpoint sequence, which at first glance bears little similarity to the enhancer GGGGCUG sequence. Mutation studies of nucleotides important for SF1 recognition of branchpoints (6), however, indicate that the central CUA of the yeast UACUAAC branchpoint is critical for binding, although the A could be altered to a G without serious impairment of binding (Fig. 7C). Thus, SF1 would be predicted to bind via its KH domain to the terminal CUG within the enhancer repeat. In addition, mutational analysis also indicated that mutation of the two nucleotides within the branchpoint preceding the CUA to G's also had little effect on SF1 binding (6). Therefore, binding of SF1 to the ISE repeated sequence GGGGCUG is not surprising.
Model for SF1 binding to the enhancer GGGGCUG enhancer sequence. (A) Comparison of the zinc knuckle domains within SF1 (bone) and the HIV nucleocapsid protein NL4-3. The zinc knuckle is indicated by the box, and amino acids found to contact RNA within the HIV protein are indicated with asterisks. (B) Model for binding of SF1 to the enhancer GGGGCUG sequence. (C) Alignment of the enhancer core with branchpoints from mammalian or yeast genes.
The zinc knuckle within SF1 could also participate in RNA binding. Studies with another zinc knuckle-containing protein, the human immunodeficiency virus (HIV) nucleocapsid protein, indicate a preferred GGAG binding site for this zinc knuckle (15), similar to the sequence at the beginning of the enhancer repeat. The region of the HIV protein containing the zinc knuckle has 37% overall identity with SF1 in this region (Fig. 7A), suggesting they may have similar binding preferences. In addition, purified SF1 has a preference for binding to poly(G) at low salt (4). These observations suggest a model for binding (Fig. 7B) in which both the zinc knuckle and the KH domain contact the enhancer GGGGCUG in a fashion similar to that postulated for contact between BBP and branchpoints (8).
Our data suggest a mechanism whereby SF1 activates exon definition. We suggest, in the case of cTNT exon 17, that SF1 (or an SF1-U2AF heterodimer) binds to the short G-rich repeat GGGGCUG located in the ISE downstream of the 5′ splice site of this very small exon. After initial interactions with the RNA, bound SF1 stabilizes association of U2AF65 to the upstream pyrimidine tract. In this model, SF1 becomes an exon bridging factor rather than an intron bridging factor (Fig. 8). Binding of SF1 to the ISE and activation of U2AF may help to explain the previously startling observation that the ISE functions to activate exon inclusion of microexons when positioned either downstream or upstream of the target exon (12); i.e., as a bridging protein, SF1 should be able to operate in either an exonic or intron polarity.
Model for SF1-mediated exon versus intron bridging. (A) Intron bridging as described for S. cerevisiae; (B) exon bridging as discussed for the exon in this study.
It should be noted that the 3′ splice site of cTNT exon 17 contains the sequence GGUGCUG located at −10 to −4 with respect to the 3′ splice site, suggesting that in the natural context, SF1 could bind to sequences immediately flanking both the 3′ and 5′ splice sites of the exon. This upstream copy must not be required for SF1 activation, however, because the three heterologous exons used in this study, the 7-nt sTNI exon 3, the 30-nt exon 5 from cTNT, and the artificial 33-nt exon from the β-globin-derived DUP33, are all enhanced by the ISE despite the absence of GGGGCUG sequences within their 3′ splice site regions. All of the utilized and responsive exons have long U-poor, C-rich polypyrimidine tracts (Fig. 1 and Materials and Methods). The four 3′ splice sites have an average pyrimidine tract length of 32 nt with 50% C's. More importantly, the longest uninterrupted U tract within the four 3′ splice sites is only 4 nt. Therefore, each would be predicted to be a poor binding site for U2AF. It will be interesting to see if SF1-mediated exon definition of small exons such as these is accompanied by variations in the constellation of proteins that bind the branchpoint-polypyrimidine tract during early spliceosome assembly.
We would also suggest that recognition of 5′ splice site-proximal G-rich sequences by SF1 may be a common mechanism in vertebrate splicing. Statistical analysis of sequence composition downstream of 5′ splice sites has indicated an unusually high concentration of G-rich sequences (19, 31). In addition, other small exons appear to have GGGGCUG elements located in adjacent introns. Examples include the alternatively spliced 18-nt murine src N1 exon, which contains two copies of this repeat located near other intronic accessory sequences (10), and the human p53 gene, containing multiple copies of the repeat located downstream of 22-nt exon 3. In addition, alternatively splicing within the chicken β-tropomyosin gene is dependent on a G-rich repeat (consensus [A/U]GGG) found six times immediately downstream of the second mutually exclusive exon 6B (38). Although this latter element does not have the complete sequence of the cTNT ISE GGGGCUG repeat, it is similar and could be recognized by a bridging protein.
The binding of SF1 to sequences downstream of the 5′ splice site naturally raises questions about the role of U1 snRNPs in recognizing the 5′ splice site terminating the microexon. Experimentation to address this question indicated the need for U1 snRNP in exon recognition. When U1 snRNPs were depleted from extract, assembly of the 3′ splice site of the exon did not occur (data not shown), indicating a general requirement for U1 snRNPs. Interestingly, cleavage of the 5′ terminus of U1 RNA actually increased formation of the ATP-dependent complex A, suggesting a different mode of recognition by U1 snRNPs than the standard base pairing to 5′ splice sites. In a different study using a precursor RNA containing multiple GGG triplets downstream of the 5′ splice site but no copies of GGGGCUG, we have discovered that U1 RNA can effectively base pair to GGG triplets and this base pairing is resistant to RNase cleavage of the 5′ end of U1 RNA because of the presence of CCU at nt 8 to 10 of U1 RNA (A. McCullough and S. M. Berget, submitted for publication). Coupled with the observation that yeast SF1 (BBP) interacts with U1 snRNP proteins an attractive model emerges in which U1 snRNPs and SF1 bind to enhancer sequences downstream of the exon, thereby extending the domain of the microexon (Fig. 8).
The observation of SF1 binding to sequences adjacent to the 5′ splice site raises the possibility that SF1 could participate in alternative 5′ splice site or exon recognition. In mammals, SF1 comes in a variety of forms that are expressed differentially in different cell types (4). These different forms are distinguished by different C-terminal proline-rich domains that presumably contain protein-protein interaction motifs. Thus, the finding here may hint at other activities of SF1 yet to be discovered.
The mechanism of exon recognition suggested by our experiments is a variation of the basic theme of initial complex formation thought to occur on constitutive exons and introns in vertebrates. It suggests that individual constitutively recognized exons or introns may utilize either slightly different proteins or binding arrangements of standard proteins. We have recently observed another nonstandard arrangement in a short Drosophila intron that uses SRp54 instead of SF1 to bridge the intron (24). In both this case and the exon studied here, the 3′ splice site polypyrimidine tract that is being stimulated is a sequence that does not resemble a preferred binding site for U2AF65 (37), suggesting that accessory proteins may function to maximize U2AF association with constitutive exons as well as with alternative exons. In vertebrates, there are multiple genes resembling the major form of both subunits of U2AF (42, 46, 48; P. S. McCaw and P. A. Sharp, personal communication), suggesting even further permutations on the set of factors used during initial recognition of splice sites. Thus, there may be considerable variation in the ways in which interactions between factors binding to 3′ and 5′ splice sites can occur in constitutive exons and introns.
ACKNOWLEDGMENTS
We thank Angela Kramer for providing the anti-SF1 antibody and for helpful comments and critical review of the manuscript.
FOOTNOTES
- Received 14 July 1999.
- Returned for modification 25 August 1999.
- Accepted 15 March 2000.
- Copyright © 2000 American Society for Microbiology