Previous Article | Next Article ![]()
Molecular and Cellular Biology, July 2004, p. 5797-5807, Vol. 24, No. 13
0270-7306/04/$08.00+0 DOI: 10.1128/MCB.24.13.5797-5807.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Laboratoire de Biologie Moléculaire Eucaryote du CNRS, UMR5099, IFR109 CNRS, 31062 Toulouse Cedex 4,1 Institut de Génétique Moléculaire, 34000 Montpellier, France,3 Biological Research Center, Hungarian Academy of Sciences, Szeged, Hungary2
Received 26 January 2004/ Returned for modification 23 March 2004/ Accepted 1 April 2004
|
|
|---|
|
|
|---|
While in tRNAs, most modified nucleotides are synthesized by protein enzymes, in eukaryotic rRNAs and snRNAs, site-specific synthesis of the most prevalent modified ribonucleotides, the 2'-O-ribose-methylated nucleotides and the pseudouridines, is achieved by two distinct families of ribonucleoproteins (RNPs) (reviewed in references 14, 18, 27, 28, and 52). The modification guide RNPs consist of a sequence-specific guide RNA and a set of common proteins. Each 2'-O-ribose methylation guide RNA carries the conserved C, C' (consensus, RUGAUGA), D, and D' (CUGA) box motifs and possesses one or two 10- to 21-nucleotide-long antisense elements that are responsible for selection of the correct substrate ribonucleotides through the formation of double helices with the target RNAs (11, 32). The selected ribonucleotides are 2'-O-ribose methylated by the Nop1p/fibrillarin methyltransferase enzyme that, in addition to the Snu13 (15.5-kDa), Nop56p, and Nop58p RNP proteins, is associated with all box C/D RNAs (14, 18, 27, 52, 55).
The pseudouridylation guide RNAs are composed of two major hairpin elements that are connected by a hinge and followed by a short tail region (Fig. 1A). The single-stranded hinge and tail region carry the conserved H (consensus, ANANNA) and ACA box motifs that are located at the bases of the 5' and 3' hairpins, respectively (6, 20). Two short antisense elements located in an internal loop of the 5' and/or 3' hairpins provide the sequence specificity for the pseudouridylation guide RNP by base pairing to the sequences that precede and follow the target uridine. This interaction creates the "pseudouridylation pocket," in which the unpaired substrate uridine selected for pseudouridylation is located 14 or 15 bp upstream of the H or ACA motif of the guide RNA (19, 43). The dyskerin/Cbf5p pseudouridine synthase, together with the Nhp2, Nop10, and Gar1 RNP proteins, is an integral component of box H/ACA pseudouridylation guide RNPs (33, 56).
![]() View larger version (17K): [in a new window] |
FIG. 1. Schematic structure of box H/ACA RNAs and cDNA construction. (A) Selection of pseudouridylation sites by box H/ACA guide RNAs. For details, see the text. (B) Construction of a cDNA library of human box H/ACA RNAs. HeLa cell RNAs immunoprecipitated by an anti-GAR1 antibody were incubated with a phosphorylated oligoribonucleotide in the presence of T4 RNA ligase. RNA sequences tagged at both termini were converted into double-stranded DNA by a reverse transcription-PCR amplification approach. The amplified DNA was cloned into a plasmid vector, and individual clones were characterized by sequence analysis.
|
In the yeast Saccharomyces cerevisiae, most, if not all, ribosomal pseudouridines are synthesized by box H/ACA snoRNPs (49). Since only a limited number of box H/ACA RNAs have been identified in humans, it remains unknown to what extent guide RNAs participate in the pseudouridylation of human cellular RNAs (13, 20, 25, 26, 29, 31, 48, 54). A recent identification of partial sequences of several putative box H/ACA snoRNAs in mice suggested that mammalian cells express a large number of box H/ACA RNAs (23). In this study, the identification of 61 novel human box H/ACA RNAs provided us with new insights into the function and organization of the molecular machinery directing the pseudouridylation of human rRNAs, snRNAs, and probably other cellular RNAs.
|
|
|---|
cells. Plasmid purification and sequence analysis were performed according to standard laboratory protocols (50).
Mapping of pseudouridines.
Isolation of RNA from human HeLa cells was performed by the guanidine thiocyanate-phenol-chloroform extraction procedure (21). Detection of pseudouridines in the 18S and 28S rRNAs was performed by primer extension analysis of carboxymethyl cellulose (CMC)-alkali-treated HeLa cell RNAs (5). 32P-labeled oligonucleotides complementary to the human 18S rRNA from positions C238 to U256 (
222), U685 to G700 (
613 and
655), C762 to U784 (
690), and A1374 to C1393 (
1330 and
1351) were used as primers. Mapping of
2496 in the 28S rRNA was performed with a primer complementary to the 28S rRNA from positions A2531 to C2548. For numbering of human 18S and 28S rRNAs, see GenBank accession number U13369. The primer extension products were fractionated on 6% sequencing gels.
Expression constructs. The ACA26, ACA35, and ACA57 scaRNAs were overexpressed in human HeLa cells. To this end, the coding regions of ACA26 (oligonucleotides ACTAATCGATTACATTTTGAAGTTAGTGG and TCTAACGCGTTTGAAATAAGTCAATAAG), ACA35 (oligonucleotides ACTAATCGATTAGACCTGAGATGTGCTTA and TCTAACGCGTACAGTCACTAAAGCCGTA), and ACA57 (oligonucleotides ACTAATCGATGTAAGTCTGCCTGTCCTAT and TCTAACGCGTCTTAGGACGGCCCTCCTA) were PCR amplified with HeLa cell genomic DNA as a template. The amplified fragments were digested with restriction endonucleases ClaI and XhoI and inserted into the same sites of the pCMV-globin expression construct (13). Transfection of HeLa cells was performed with Fugene 6 (Roche) transfection reagent according to the manufacturer's instructions.
Fluorescence in situ hybridization. Synthesis and chemical conjugation of amino-modified oligodeoxynucleotides with FluoroLink Cy3 monofunctional dye (Amersham), fluorescence hybridization of transfected HeLa cells, and image acquisition and processing were performed as described elsewhere (http://singerlab.aecom.yu.edu) (13). The following oligonucleotide probes were used to detect transiently expressed human scaRNAs (asterisks indicate amino-allyl-modified T residues that are sites of attachment for the fluorescent label): ACA26, AT*CAGCAAAGTCTTACTT*CATCAGACTCAGCCT*T; ACA35, TT*CTTAAACCCAGCTAT*CACAACACATCACAAGCCTT*T; and ACA57, GT*GTGTCCTGCCAGACT*ACCCTGTTAGAACT*G. A polyclonal rabbit anti-p80-coilin antibody was kindly provided by A. Lamond. Nuclear DNA was stained with 0.1 µg of 4',6'-diamidino-2-phenylindole/ml.
|
|
|---|
We identified a total of 1,120 RNA sequences that defined 17 previously identified and 61 novel box H/ACA RNAs. As demonstrated by computer folding (57), the new RNAs displayed all the characteristic hallmarks of box H/ACA RNAs; they folded into a "hairpin-hinge-hairpin-tail" structure and carried H and ACA box motifs (data not shown). Since among the defining structural features of H/ACA RNAs, the presence of an ACA motif or, in a few instances, an AUA motif located 3 nucleotides from the RNA 3' end was noted, the newly identified RNAs were designated ACA RNAs and numbered from 1 to 61. The expression and size of each RNA were confirmed by Northern blot analysis. The majority of recombinant plasmids (62%) carried cDNAs of full-length box H/ACA RNAs. In a few instances, 5'- and/or 3'-extended RNAs that apparently represented processing intermediates of mature box H/ACA RNAs were also identified. About one-fourth of the recombinant plasmids carried cDNAs corresponding to various fragments of the 18S and 28S rRNAs (15%) or representing full-length or partial sequences of the 5S and 5.8S rRNAs (8%). Less frequently (<2%), we also obtained plasmids carrying spliceosomal snRNA, tRNA, and mRNA sequences. Although about 20 box H/ACA RNAs were highly overrepresented in our cDNA library, we identified the sequences of 18 box H/ACA RNAs only once, suggesting that our survey was not saturated.
The computer-predicted two-dimensional structures of the new box H/ACA RNAs were scrutinized to find putative antisense target recognition elements and eventually to identify potential substrate RNAs. Based on their predicted functions, the new box H/ACA RNAs were divided into three groups. As expected, the majority of these RNAs (43 species) have been implicated in guiding the pseudouridylation of the 18S or 28S rRNAs. Another group of RNAs (6 species) have been assigned to the direction of pseudouridine synthesis for the major spliceosomal snRNAs. Finally, 12 box H/ACA RNAs classified in the third group lacked significant complementarities to any known stable cellular RNAs.
Guide RNAs directing the pseudouridylation of 18S rRNA. The human 18S rRNA is estimated to contain about 38 pseudouridine residues, 30 of which have been located either exactly or to within 2 or 3 nucleotides (35, 36). Of the newly identified box H/ACA RNAs, 19 have been predicted to function in 18S rRNA pseudouridylation (Fig. 2). The potential base-pairing interactions formed between these guide RNAs and 18S rRNA sequences perfectly conformed to the structural requirements defined for efficient RNA-guided pseudouridylation reactions (8, 19, 43). In the pseudouridylation pocket, the target uridines occupied an invariant position located 14 or 15 nucleotides upstream of the H or ACA box of the guide RNA.
![]() View larger version (39K): [in a new window] |
FIG. 2. Potential base-pairing interactions between box H/ACA RNAs and human 18S rRNA. (A) Selection of known pseudouridylation sites. The upper strands represent box H/ACA RNA sequences in a 5'-to-3' orientation. Solid lines represent the upper parts of the 5' or 3' hairpins of guide RNAs. The ACA motifs are in closed boxes. The first three nucleotides of the putative H motifs are in open-ended boxes. The lower strands represent 18S rRNA sequences in a 3'-to-5' orientation. The positions of pseudouridine residues were reported previously (36). The pseudouridine residues defined by interactions with guide RNAs are indicated ( ). The sequence of human 18S rRNA is from GenBank accession number U13369. (B) H/ACA RNAs distinguishing between two or three potential pseudouridylation sites. A bar below rRNA sequences indicates that one of the underlined uridines is pseudouridine. (C) Prediction of new pseudouridylation sites. Question marks indicate novel pseudouridylation sites revealed by identification of the corresponding guide RNAs.
|
1629), ACA8 (
1060 and
1085), ACA13 (
1252), ACA20 (
655), ACA24 (
867), ACA25 (
805 and
818), ACA28 (
819 and
870), ACA36 (
109), ACA41 (
1648), and ACA50 (
38 and
109), the selected uridines already had been demonstrated to be pseudouridylated (35, 36) (Fig. 2A). The base-pairing capacity of some other guide RNAs, such as ACA5 (
1242), ACA14 (
970), ACA15 (
1371), ACA31 (
222), ACA36 (
1248), ACA42 (
113 and
576), ACA44 (
826), and ACA60 (
1008), could distinguish between two or three possible pseudouridylation sites that had not been located to nucleotide resolution (36) (Fig. 2B). Finally, the uridine residues selected by the ACA4 (U1351), ACA10 (U214), ACA24 (U613), ACA44 (U690), and ACA46 (U653) putative pseudouridylation guide RNAs had not been reported to be pseudouridylated (36) (Fig. 2C).
Since not all pseudouridines had been located on the human 18S rRNA, the state of pseudouridylation of these uridine residues was examined by the CMC treatment-primer extension procedure (5) (Fig. 3). CMC reacts with N3 of pseudouridine, and the modified CMC-pseudouridine arrests reverse transcriptase 1 nucleotide before the pseudouridylation site. When 32P-labeled sequence-specific oligonucleotides were annealed to CMC-modified 18S rRNA and extended by reverse transcriptase, stop signals were observed 1 nucleotide before the U1351, U214, U613, U690, and U653 residues, indicating that they are pseudouridylated. Primer extension mapping of
690 also revealed that, in contrast to previous reports (35, 36), neither U692 nor U693 is pseudouridylated in HeLa cell 18S rRNA. Likewise, mapping of
222 showed that the 222-UUU-224 region of the human 18S rRNA contains only one and not two pseudouridines, as proposed before (35, 36). Thus, the characterization of new box H/ACA guide RNAs revealed new pseudouridylation sites and defined the correct positions of several previously detected pseudouridines in the human 18S rRNA.
![]() View larger version (46K): [in a new window] |
FIG. 3. Verification of pseudouridine residues in human 18S and 28S rRNAs predicted by guide RNA-rRNA interactions. CMC-alkali-modified ( ) or control (N) HeLa cell RNAs were analyzed by primer extension with 32P-labeled oligonucleotide primers complementary to the appropriate regions of the human 18S and 28S rRNAs. Lanes A, G, C, and U show dideoxy sequencing reactions performed on recombinant plasmids carrying the human 18S or 28S rRNA genes. Brackets and asterisks indicate uridines that were reported to be pseudouridylated.
|
684,
922,
1178 (36),
1330 (Fig. 3), and one not yet placed, are synthesized by protein enzymes. However, it is also possible that pseudouridylation of the human 18S rRNA is achieved entirely by box H/ACA snoRNPs. Pseudouridylation guide RNAs directing the modification of 28S rRNA. Based on the estimated ratio of its pseudouridine content to its uridine content, the human 28S rRNA was predicted to carry about 57 pseudouridines (22, 35). Later, primer extension mapping located 54 pseudouridines to nucleotide resolution on the human 28S rRNA (45). At the outset of this study, six human box H/ACA guide snoRNAs (U19, U64, U65 U68, E2, and E3) had been implicated in the synthesis of nine pseudouridines in 28S rRNA (19) (see Table S2 in the supplemental material). Our survey identified 26 additional box H/ACA snoRNAs which could be linked to 35 reported pseudouridylation sites in the 28S rRNA (Fig. 4). Another snoRNA, ACA61, was predicted to position the U2496 residue for pseudouridylation. Indeed, primer extension mapping confirmed the presence of a novel pseudouridine at U2496 (Fig. 3), indicating that ACA61 is a genuine pseudouridylation guide RNA. Therefore, in total, 32 human box H/ACA snoRNAs have been implicated in 28S rRNA pseudouridylation. These guide RNAs can select 44 of the 57 pseudouridylation sites present in the human 28S rRNA (see Table S2 in the supplemental material).
![]() View larger version (66K): [in a new window] |
FIG. 4. Potential base-pairing interactions between box H/ACA RNAs and human 28S rRNA. The sequence of human 28S rRNA is from GenBank accession number U13369. The positions of pseudouridine residues were reported previously (45). See the legend to Fig. 2 for other details.
|
Of the 56 guide RNAs implicated in rRNA pseudouridylation, 22 are capable of directing two independent modification reactions. Usually, the two target sites of the "double pseudouridylation guides" are found on the same rRNA and, most frequently, are located close to each other in the primary rRNA sequence. The target pseudouridines selected by the 5' hairpins of double guides can be located either upstream or downstream of the pseudouridylation sites determined by their 3' hairpins. Less frequently, the 5' and 3' hairpins of a few double guides can function in the pseudouridylation of two rRNA species. The U69 snoRNA can direct the pseudouridylation of the 18S and 5.8S rRNAs, while the ACA10 and ACA31 snoRNAs are predicted to function in the modification of the 18S and 28S rRNAs.
Together, the two short helixes formed by the H/ACA guide RNA and rRNA sequences preceding and following the target uridine comprise a minimum of 9 bp or a maximum of 16 bp (Fig. 2 and 4). Most frequently (in 31% of the total instances), the rRNA-H/ACA RNA interaction involves 11 bp. In a few instances, mismatched (ACA24, ACA25, ACA15, and ACA58) or bulging (ACA41) nucleotides are also involved in the predicted rRNA-H/ACA RNA interaction. It is also noteworthy that G-U base pairs frequently occur in the predicted rRNA-H/ACA RNA helices, especially those composed of more than 10 bp.
Identification of the ACA2 and ACA34 snoRNAs provided us with new insights into the molecular mechanism of the evolution of modification guide RNAs. The ACA34 snoRNA and two sequence variants of the ACA2 snoRNA (ACA2a and ACA2b) are encoded within three different introns of a hypothetical protein gene, FLJ20436 (see Table S2 in the supplemental material). Interestingly, the ACA34 snoRNA shows a strong sequence similarity to both isoforms of ACA2 (66% identity). Therefore, it could be considered a third sequence variant of ACA2. Consistent with this notion, the 3' hairpin of ACA34, similar to those of ACA2a and ACA2b, can position the U4283 residue in the 28S rRNA for pseudouridylation (Fig. 4). However, while the 5' hairpins of ACA2a and ACA2b can direct the synthesis of
4264, the 5' hairpin of ACA34 selects
4270 in the 28S rRNA. Apparently, the ACA2a, ACA2b, and ACA34 snoRNA genes have been generated by subsequent gene duplications during evolution. After the first duplication event, random point mutations in the target recognition motifs of the ACA34 gene or the parental gene of the contemporary ACA2a and ACA2b genes resulted in a novel snoRNA gene with new sequence specificity. The second gene duplication event generated the current ACA2a and ACA2b genes. Of course, random mutations in the ACA2a or ACA2b genes may produce another, functionally distinct pseudouridylation guide RNA gene during future evolution.
Guide RNAs implicated in the pseudouridylation of spliceosomal snRNAs.
The human U1, U2, U4, U5, and U6 spliceosomal snRNAs together carry 21 pseudouridines (38). Earlier, we characterized four guide RNAs (U85, U89, U92, and U93) which were predicted to direct the synthesis of four pseudouridines in the U5 (
85 and
89) and U2 (
92 and
93) snRNAs (see Table S3A in the supplemental material). The current survey identified six additional box H/ACA RNAs implicated in the pseudouridylation of the U6 (ACA12), U2 (ACA26, ACA35, and ACA45), U1 (ACA47), and U5 (ACA57) snRNAs (Fig. 5A). Together, the above-described H/ACA guide RNAs can direct the synthesis of 11 pseudouridine residues in the U1, U2, U5, and U6 spliceosomal snRNAs. This finding indicates that the involvement of box H/ACA guide RNAs in snRNA pseudouridylation is more general than demonstrated before. However, at this time, we cannot exclude the possibility that protein enzymes also contribute to the pseudouridylation of spliceosomal snRNAs (25, 37).
![]() View larger version (47K): [in a new window] |
FIG. 5. Putative pseudouridylation guide RNAs directing the modification of spliceosomal snRNAs. (A) Proposed base-pairing interactions between box H/ACA guide RNAs and human spliceosomal snRNAs. (B) Fluorescence in situ localization of guide RNAs transiently overexpressed in HeLa cells. The schematic structure of the pCMV-globin expression construct is shown. The promoter region of cytomegalovirus (CMV), the exons in the human ß-globin gene (E1 to E3), the polyadenylation region in the bovine growth hormone gene (PA), and the SP6 promoter are shown. The relevant restriction sites are indicated (H, HindIII; C, ClaI; X, XhoI). Fluorescence in situ hybridization with oligonucleotide probes specific for the ACA26, ACA35, and ACA57 scaRNAs was combined with indirect immunofluorescence with an antibody against the Cajal body marker protein, p80-coilin. The nuclear DNA was stained with 4',6'-diamidino-2-phenylindole (blue). Bar, 10 µm.
|
In human HeLa cells, the formerly characterized pseudouridylation and 2'-O-ribose methylation guide RNAs involved in modification of the RNA pol II-transcribed U1, U2, and U5 spliceosomal snRNAs were found to specifically accumulate in nucleoplasmic Cajal bodies (13, 24, 26). To further explore the subnuclear organization of the modification machinery of pol II-specific snRNAs, we investigated the localization of the newly identified ACA26, ACA35, and ACA57 RNAs that were implicated in the pseudouridylation of the U2 snRNA. Due to the detection limit of fluorescence in situ hydridization, the endogenous ACA26, ACA35, and ACA57 RNAs were not visible in HeLa cells (data not shown). Therefore, to facilitate detection, the ACA26, ACA35, and ACA57 RNAs were transiently overexpressed in HeLa cells by using the pCMV-globin expression construct, which had been developed to study the expression of intronic snoRNAs (13, 30) (Fig. 5B). Upon hybridization with fluorescent oligonucleotides specific for the ACA26, ACA35, and ACA57 RNAs, a few bright foci were observed in the nuclei of transfected HeLa cells. Staining of the same cells with an antibody against the Cajal body marker protein, p80-coilin (3), demonstrated that the foci accumulating the transiently expressed box H/ACA RNAs were Cajal bodies. These results further support the notion that the guide RNA machinery directing the modification of pol II-transcribed spliceosomal snRNAs is sequestered in Cajal bodies and that Cajal bodies are the nuclear locale for RNA-guided modification of pol II-specific spliceosomal snRNAs (13, 24, 26).
Putative pseudouridylation guide RNAs lacking substrate RNAs. Our screen also identified 12 putative pseudouridylation guide RNAs which lacked the potential for guiding pseudouridine synthesis in human rRNAs, snRNAs, snoRNAs, scaRNAs, and YRNAs as well as in the U7, 7SK, MRP, RNase P, telomerase, and 7SL RNAs (see Table S3B in the supplemental material). The function of these so-called "orphan" guide RNAs remains unknown. According to the most obvious scenario, they may direct pseudouridine formation in some not-yet-identified RNAs. Thus, our data indicate that H/ACA RNA-directed RNA pseudouridylation is not restricted to rRNAs and snRNAs and is a more general phenomenon than assumed before. Alternatively, some of the orphan box H/ACA RNAs may also function in other aspects of RNA biogenesis. For example, the human U17 box H/ACA snoRNA and its yeast orthologue, snR30, play an essential role in the nucleolytic processing of 18S rRNA (4, 41). Likewise, a few box C/D snoRNAs play a crucial role in the nucleolytic processing of 18S (U3, U14, and U22) as well as 5.8S and 28S (U8) rRNAs (52). Therefore, it is possible that at least some of the newly identified orphan box H/ACA RNAs function in the nucleolytic processing of rRNAs or other cellular RNAs.
Northern blot analysis revealed that, in general, orphan H/ACA RNAs accumulate in HeLa cells at much lower levels than do snoRNAs involved in rRNA pseudouridylation (data not shown). In line with this observation, the majority of the new orphan RNAs appeared only once in our screen. This finding strongly suggests that the low-abundance orphan H/ACA RNAs, in contrast to the abundant rRNA modification guide RNAs, are highly underrepresented in our cDNA library. On the other hand, box H/ACA RNAs may also possess a tissue-specific expression pattern. The mouse and human HBI-36 snoRNAs that are encoded within introns of the brain-specific serotonin receptor gene accumulate only in brain tissues (10). Therefore, it is possible that many additional low-abundance and/or tissue-specific box H/ACA RNAs remain unidentified in human cells.
Genomic organization of human box H/ACA RNA genes. Database searches revealed that the human genome carries one perfect and sometimes a few additional imperfect copies of the newly identified box H/ACA RNAs (see Tables S1, S2, and S3 in the supplemental material). The genomic loci that matched perfectly our cDNA sequences were considered bona fide H/ACA RNA genes. With no exception, all H/ACA RNA genes were found within introns of active transcription units known to produce spliced mRNAs. Moreover, all bona fide H/ACA RNA genes showed a parallel orientation with their host genes, indicating that the H/ACA RNAs and their host pre-mRNAs are synthesized cotranscriptionally. The majority of the H/ACA host genes are protein-coding genes, although in many instances, the functions of their predicted protein products remain unknown. Frequently, two or even more H/ACA RNAs are encoded within different introns of the same host gene. For example, the MGC5306 gene, which encodes a hypothetical protein, hosts at least six box H/ACA RNAs (ACA1, ACA8, ACA18, ACA25, ACA32, and ACA40) as well as two box C/D snoRNAs (mgh28S-2410 and mgh28S-2412) that are predicted to direct 2'-O-ribose methylation of the human 28S rRNA at positions C2410 and G2412 (unpublished results).
The rRNA pseudouridylation guide RNA genes are frequently located in ribosomal protein genes or in other genes connected to ribosome biogenesis (nucleolin) or protein synthesis (translation initiation and elongation factors and cytoplamic protein chaperones). Interestingly, the ACA36 and ACA56 genes lie within introns of the dyskerin gene, which encodes the common pseudouridine synthase of box H/ACA RNPs. Another snoRNA, ACA23, is hosted by the importin 7 gene, which encodes an import receptor for ribosomal proteins. Collectively, these observations further corroborate the notion that cotranscription is an important way of coordinating the regulation of factors required for ribosome synthesis and function. Therefore, it is possible that at least some of the snoRNA host genes that lack a function will be shown to participate in some aspects of protein synthesis. On the other hand, there are also a few host genes, for example, the cytochrome P450 oxidoreductase (ACA14a and ACA14b), methyl-CpG binding domain protein 2 (a DNA methyltransferase) (ACA37), and SRCAP (a transcriptional activator) (ACA30) genes, that cannot be directly linked to ribosome biogenesis or function.
Usually, the host genes of H/ACA RNAs directing the pseudouridylation of spliceosomal snRNAs (see Table S3A in the supplemental material) or lacking potential substrate RNAs (see Table S3B in the supplemental material) cannot be directly connected to ribosome biogenesis or translation. The ACA33 and ACA51 RNAs represent the only exceptions to this rule, since they are encoded within the S12 ribosomal protein and NOP56 box C/D snoRNP protein genes, respectively. The nonribosomal guide RNA genes frequently appear within genes encoding proteins involved in the functional maintenance of the human genome, such as the catalytic subunit of DNA polymerase alpha (ACA12), condensin subunit 1 (U85), nucleosome assembly protein 1-like protein (ACA54), and chromodomain helicase DNA binding protein 4 (ACA57). The functional significance of this observation is still unclear.
Several box H/ACA RNAs are encoded within genes that have little capacity for protein coding. The spliced and polyadenylated mRNA-like products of these genes contain no long open reading frames. Therefore, it appears that the only function of these genes is to express their intronic H/ACA RNAs (53). Of the 75 known human box H/ACA RNA genes, 15 seem to be located within introns of non-protein-coding genes, indicating that cotranscription within non-protein-coding pre-mRNAs is a rather common way to express pseudouridylation guide RNAs. Previously, four non-protein-coding snoRNA host genes were identified and shown to belong to the family of 5'-terminal oligopyrimidine (5'TOP) genes (9, 46, 51, 53). The sequence of 5'TOP mRNAs commences with a 5'-terminal C residue and is followed by a short pyrimidine tract that plays an important role in the upregulation of transcription and translation of 5'TOP mRNAs (2). The family of 5'TOP genes also includes ribosomal protein and translation elongation factor genes. The expression of rRNA modification guide snoRNAs within non-protein-coding 5'TOP pre-mRNAs therefore may provide a regulatory mechanism to coordinate the accumulation of snoRNAs and ribosomal proteins. Inspection of the 5'-terminal sequences of the expressed sequence tags of the newly identified non-protein-coding H/ACA snoRNA host genes revealed that at least two of them, named TOP1 (encoding ACA16, ACA44, and ACA61) and TOP2 (encoding ACA17 and ACA43), belong to the family of 5'TOP genes. Unfortunately, the correct 5' ends of other non-protein-coding H/ACA host genes could not be inferred from their partial expressed sequence tags deposited in databases.
Besides the bona fide genes, most box H/ACA RNAs possess one or more imperfect genomic copies. These defective copies often represent 5'- or 3'-truncated versions of the authentic H/ACA RNA gene, indicating that they are apparently pseudogenes. Consistent with this conclusion, the sequences of the truncated H/ACA genes cannot be folded into the characteristic secondary structures of box H/ACA RNAs. Other genomic copies represent full-length H/ACA RNAs, but they contain numerous point mutations, short internal deletions, and/or insertions. These genomic sequences frequently fail to fold into a perfect hairpin-hinge-hairpin-tail structure or lack functional H and/or ACA motifs, suggesting that they do not code for functional RNAs. Supporting this notion, the default H/ACA sequences are frequently located in transcriptionally silent genomic loci or, alternatively, lie within known transcription units but in an opposite orientation. Both truncated and full-length pseudogene sequences are often followed by 8- to 15-nucleotide-long oligo(A) tracts and sometimes are flanked by short perfect repeats, indicating that they have been generated by retrotransposition (54). Finally, full-length H/ACA RNA sequences carrying a few point mutations are also found within introns of known genes in a parallel orientation. Since the sequences of these genes fold into the hairpin-hinge-hairpin-tail structure, they likely represent functional H/ACA RNA genes expressing sequence variants of the characterized RNAs.
Conclusions. In this study, we identified 61 human box H/ACA snoRNAs and scaRNAs. The majority of these RNAs (49 species) are predicted to function as guide RNAs in the synthesis of 71 pseudouridine residues in the human 18S and 28S rRNAs and the U1, U2, U5, and U6 spliceosomal snRNAs. Some of the new H/ACA RNAs (12 species) lack potential target sites in human rRNAs, snRNAs, and snoRNAs. These orphan H/ACA RNAs either direct the pseudouridylation of some not-yet-identified RNAs or function in other aspects of cellular RNA biogenesis. As predicted by their genomic organization, all human box H/ACA RNAs are processed from pre-mRNA introns. The host genes of human H/ACA RNAs can be divided into three major groups. Most of them encode well-characterized proteins that frequently function in ribosome biogenesis or protein synthesis. Another group of H/ACA RNA host genes are predicted to encode proteins with unknown functions. Finally, the host genes of many H/ACA RNAs lack apparent protein-coding capacity and frequently belong to the family of 5'TOP genes.
A.M.K. was funded by a short-term EMBO fellowship, a Hungarian State Eötvös fellowship, and l'Association pour la Recherche contre le Cancer. This work was supported by grants from CNRS, la Ligue Nationale contre le Cancer, and the Hungarian Research Foundation (OTKA, T31738).
Supplemental material for this article may be found at http://mcb.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»