| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
,
Physiologie Moléculaire de la Cellule, IBMM, Université Libre de Bruxelles (ULB), Rue des Pr. Jeener et Brachet 12, 6041 Gosselies, Belgium,1 Conformation des Macromolécules Biologiques, ULB, CP 263, Boulevard du Triomphe, 1050 Bruxelles, Belgium,2 Machine Learning Group, ULB, CP 212, Boulevard du Triomphe, 1050 Bruxelles, Belgium3
Received 16 June 2006/ Returned for modification 24 July 2006/ Accepted 16 January 2007
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
|
GAAC is mediated by the Gcn4 transcription factor (57). This protein is most active in cells starved of amino acids, a situation in which the Gcn2 protein (eIF2
kinase) causes the translation of Gcn4 mRNA to be derepressed (56). Other conditions also promote increased Gcn4 synthesis (139). Gcn4 activates the expression of a large number of genes involved in amino acid biosynthesis. Furthermore, in the absence of any amino acid deficiency, a basal level of Gcn4 mRNA translation is required for the transcription of several genes (e.g., HIS3, ARG4, ARG3, and ARO3) (57). Two genomic studies have aimed to identify the whole set of GAAC target genes (68, 90). In one of them (90), 539 genes whose expression is activated in a Gcn4-dependent manner in amino acid-starved cells were listed.
Many different amino acids can be used as general sources of nitrogen (Fig. 1). Most amino acids present in the external medium are detected by yeast cells via a membrane-associated sensor complex (Ssy1-Ptr3-Ssy5 [SPS]) made of three proteins, including Ssy1, an amino acid permease homolog devoid of transport activity (15, 46). This SPS complex in turn activates the transcription of several amino acid permease genes via the Stp1, Stp2, and Uga35/Dal81 transcription factors (1, 5, 63). Putative additional target genes of this transcriptional control system have emerged from several genomic studies aiming to list all genes induced by amino acids in a Ssy1-, Stp1-, and/or Stp2-dependent manner (43, 45, 73).
Like the transcriptional controls acting on genes involved in nitrogen anabolism, those regulating genes involved in transport and catabolism are of two types: specific ones affecting only a limited number of genes and nitrogen catabolite repression acting on a wide variety of genes (26, 84, 136). Thus, several nitrogenous compounds, such as arginine (39), proline (18), serine, threonine (59), urea, allantoin (25),
-aminobutyric acid (GABA) (121), and the aromatic amino acids (62), specifically induce the transcription of the genes involved in their utilization. On the other hand, nitrogen catabolite repression (NCR) is typically exerted on the many genes involved in the utilization of nonpreferential nitrogen sources when a good nitrogen source (e.g., ammonium, glutamine, and asparagine) is available in the medium (26, 84, 136). NCR in fact acts through the inhibition of two transcription factors of the GATA family (Gln3 and Gat1/Nil1) which typically bind to upstream 5'-GATA-3' core sequences and activate gene transcription alone or in conjunction with inducer-specific transcription factors (26, 84). The Gln3 and Gat1 factors are thus most active under limiting nitrogen supply conditions (e.g., when cells grow on poor nitrogen sources like urea and proline) and are also transiently activated upon the addition of rapamycin to nitrogen-rich media (26, 84). Rapamycin inhibits the Tor proteins, which are proposed to govern the inhibition of Gln3 and Gat1 under good nitrogen supply conditions (9). The Tor-dependent inhibition of Gln3 involves the Ure2 protein (28), whereas the repression of Gat1-dependent expression under good nitrogen supply conditions is also dependent on Gzf3/Deh1/Nil2, another GATA family transcription factor (23, 109, 119). A fourth GATA factor encoded by the DAL80/UGA43 gene (27) also acts as an inhibitor of Gat1 in specific gene contexts but is specifically active under poor nitrogen supply conditions (4, 27, 31). Transcription of the GAT1, GZF3, and DAL80 genes is under the control of all four GATA factors. A network of auto- and cross-regulation systems thus links these four key transcriptional regulators of NCR target genes (12, 23, 109, 119). Several studies have focused on identifying in the complete yeast genome the genes subject to NCR or regulated by the GATA factors or those whose expression is activated under nitrogen starvation or by rapamycin (7, 14, 29, 112, 116, 138).
In this study we used a systematic approach to examine the influence of nitrogen on the yeast transcriptome. For this we compared the expression levels of the 5,690 yeast genes in cells growing on 21 distinct unique sources of nitrogen. This analysis has enabled us to identify more than 500 nitrogen-regulated genes and to derive a general scheme representing the status of each nitrogen-sensitive transcriptional control according to the nitrogen source. It has further enabled us to associate new genes with several nitrogen-sensitive regulons and to propose a function related to nitrogen metabolism for several nitrogen-regulated orphan genes. Our results offer a novel and complementary view of how yeast cells adapt their transcriptome and metabolism to the nitrogen supply.
| MATERIALS AND METHODS |
|---|
|
|
|---|
1278b strain (8) (Table 1). Cells were grown in a minimal buffered (pH 6.1) medium with 3% glucose as the carbon source and various nitrogen sources (Table 2). To nitrogen source-free medium, described in reference 66, each of the following was added as the sole nitrogen source: 10 mM urea (reference medium), (NH4)2SO4, alanine, arginine, asparagine, aspartate, citrulline, GABA, glutamate, glutamine, isoleucine, leucine, methionine, ornithine, phenylalanine, proline, serine, threonine, tyrosine, or valine or 5 mM tryptophan. Comparative analysis of the influence of nitrogen on the expression of the nitrogen-sensitive GAP1 gene in cells growing in the nonbuffered yeast nitrogen base medium versus the citrate-buffered medium (pH 6.1) used in this study revealed similar responses to the quality of the nitrogen source (see Fig. S1 in the supplemental material). The gcn2
strain was constructed by the PCR-based gene deletion method (134). The DNA segment used to introduce this mutation was generated with the kanMX2 gene from plasmid pFAa-kanMX2 (81) as a template and the D5-GCN2 and D3-GCN2 PCR oligonucleotide primers (Table 3). Yeast strain 23344c (ura3) was transformed with the PCR fragment by the lithium method described previously (49). Transformants were selected on rich medium containing 200 µg/ml G418 (Geneticin; GIBCO BRL, Gaithersburg, MD). The GAP1::lacZ fusion in plasmid pFL38 has been previously described (119).
|
|
|
RNA preparation and qRT-PCRs. Total RNA was purified as previously described (75). Quantitative reverse transcription-PCRs (qRT-PCRs) were used to measure the mRNA levels of the following genes: ACT1, YGL258W, ZAP1, ARO9, ESBP6, ARO80, SNQ2, UGA4, AMD2, and MAE1. For this we used the RT-RTCK05 and RT-SN10-05 kits (Eurogentec, Liège, Belgium) with the following primers: ACT1-left, ACT1-right, YGL258W-left, YGL258W-right, ZAP1-left, ZAP1-right, ARO9-left, ARO9-right, ESBP6-left, ESBP6-right, ARO80-left, ARO80-right, SNQ2-left, SNQ2-right, UGA3-left, UGA3-right, AMD2-left, AMD2-right, MAE1-left, and MAE1-right (Table 3).
Generation and analysis of microarray data. To purify mRNAs, we used the poly(dT) Oligotex kit (QIAGEN, Westburg, The Netherlands). In each assay, 5 µg mRNA was converted to a labeled cDNA target with the Fairplay indirect labeling kit (Amersham-Pharmacia-Biotech, Gent, Belgium) as previously described (137). Microarrays corresponding to the genome of S. cerevisiae strain S288C were produced by Eurogentec (Liège, Belgium) (D290C and G250E series) (122). Cy3- and Cy5-labeled cDNA targets were combined in equal amounts (0.5 µg), vacuum dried (SpeedVac centrifugation), and resolubilized in 50 µl hybridization buffer composed of DIG Easy Hyb solution (Roche Diagnostics, Vilvoorde, Belgium) with 1 mg/ml salmon DNA. Hybridization with the solution of CyDye-labeled cDNA was performed at 42°C for approximately 24 h. Following hybridization, the slide was first washed in 2x SSC (1x SSC is 0.15 M NaCl plus 0.015 sodium citrate) for 30 seconds and then in 0.1x SSC-0.1% sodium dodecyl sulfate for 5 min and finally twice in 0.1x SSC for 5 min. It was then immediately dried by centrifugation (8 min at 800 x g).
The hybridized microarray was scanned with a GMS418 fluorescence reader (Genetic MicroSystems, Woburn, MA) with a resolution of 10 µm. The slide was scanned twice to get the Cy5 and Cy3 signals, once with a high-photo multiplier tube (PMT) gain and once with a low-PMT gain. Signal quantification for each probe on the microarray was performed with GenePix 4.01.17 image acquisition software (Axon Instruments, Union City, CA). Spots with a diameter greater than 210 µm or smaller than 80 µm were considered low-quality spots, as were spots having a median pixel intensity minus mean pixel intensity of more than 40% of the median pixel intensity for each channel and those having less than 95% of their spot pixels be more than 2 standard deviations above background in either the green or the red channel. Low-quality spots were excluded from further analysis. Intensity values from high-PMT-gain pictures were used, except in the case of saturated spots. In the latter case, intensity values from low-PMT-gain pictures were used after scale correction. Intensity-dependent within-tip group and scale normalization were applied as described in reference 140 using Bioconductor tools (48). Fluorescence ratios were computed on the basis of hybridization signals normalized with background corrections. Experiments were carried out independently twice, with dye swapping. For each experiment, we calculated for each gene the value M as log2(expressionM.NS/expressionM.urea) (where expressionM.NS is the level of expression of the gene on minimal medium with the considered nitrogen source at the concentration specified in Table 2 and expressionM.urea is that on minimal medium with urea). Genes that could not be measured (because of a low-quality spot) in at least one of the duplicate experiments were not considered in further analyses. Figure S2 in the supplemental material compares for each medium the results obtained in the two experiments. Calculated on the basis of a normal distribution of the SAM (significance analysis of microarray) test statistic S, S = Mg/c + SDg (where Mg and SDg are, respectively, the mean and the standard deviation of the M values for gene g and c is the 90th percentile of the SDg values) (30); the P value indicates the confidence level at which a gene can be considered differentially expressed on a given medium (M.NS) compared to the reference medium (M.urea). For each medium, we selected genes having a P value below 1/5,690 (5,690 is the total number of genes considered) to be differentially expressed. This value was chosen in the hope that there would be no more than one false positive per medium.
For genes not expressed on urea but highly expressed on other nitrogen media, M values are typically greater than 1 under all conditions for which a ratio could be calculated. Fifteen of the 506 differentially expressed genes are concerned: ARO9, BAP2, BAP3, SED1, ADH4, YGL258W, YPS5, DOG1, SPL2, YHR213W, YPS6, DAN1, YLL053c, ALD3, and YOR387C (see Table S1 in the supplemental material). The computed ratios calculated for these genes must be considered qualitative rather than quantitative. They were nevertheless considered in the data analysis, as they reflect a high level of expression on specific nitrogen media. Analysis of the distribution of gene expression levels on the 21 tested nitrogen media indicates that the number of unexpressed or poorly expressed genes is not particularly higher on urea than on the other media (see Fig. S3 in the supplemental material). Urea is thus an appropriate choice for the reference nitrogen source.
Clustering methods. Hierarchical clustering was performed using TIGR MultiExperiment Viewer (111) on the 390 genes whose expression varies significantly under at least one nitrogen condition and for which an expression ratio could be computed for at least 13 media (N value = 13). The data presented here are those obtained using the complete linkage method and the average dot product as a measure of the distance between gene expression profiles. The tree was finally segmented into eight clusters, with the distance threshold between genes considered to be 0.137 in TIGR's MultiExperiment Viewer. Other settings (N value, clustering method, distance, distance threshold) were also tested and evaluated by comparing the obtained gene clusters with predefined lists of genes. Those which were finally chosen globally resulted in the best overlaps with several lists of nitrogen-regulated genes and thus optimal physiological coherence without too much loss of information. A second hierarchical clustering was performed on genes belonging to cluster 3 using the complete linkage method and Pearson's correlation. The resulting subtree was segmented into four subclusters, with a distance threshold between genes considered to be 0.548 in TIGR's MultiExperiment Viewer.
Gene list comparisons.
Comparison of groups or subgroups of coregulated genes to predefined gene lists (about 300 in total) were performed using the compare classes utility provided by the Regulatory Sequence Analysis Tools website (http://rsat.ulb.ac.be/rsat/) (130). To check the significance of the overlap between two lists, overlapping P values were computed on the basis of the following hypergeometric formula:
![]() | (1) |
Regulatory sequence analyses. To analyze upstream noncoding sequences, we used the collection of software tools provided by the web resource Regulatory Sequence Analysis Tools (http://rsat.ulb.ac.be/rsat/) (130).
Sequence retrieval. Upstream sequences of all the yeast genes were retrieved over 800 bp upstream from the start codon. When the upstream open reading frame (ORF) is closer than this distance, a shorter sequence is retrieved, which allows us to discard coding sequences.
NCR-related motifs. We used the program oligo-analysis (131) to detect overrepresented oligonucleotides in the promoter sequences of the 41 genes annotated as NCR sensitive (A-NCR genes). This analysis was performed for all oligonucleotide sizes between 5 and 8, leading to a total of 56 significantly overrepresented oligonucleotides. Quite consistently, most of these motifs were variants of the GATA box, and the most significant among them was the canonical GATA box GATAAG. To these 56 discovered motifs, we added the auxiliary GATA box (GATTA), and six pairs of GATA boxes separated by a region from 0 to 60 base pairs. The program dna-pattern was used to count the occurrences of the 63 motifs in each yeast gene promoter.
Discriminant analysis. We applied linear discriminant analysis to classify genes into two classes (NCR versus not NCR) on the basis of the pattern counts (the complete data set is available at http://dbm.ulb.ac.be/PhysCell/data/Godard.htm). As a positive training set (NCR), we used the 41 genes previously A-NCR (see Table S2 in the supplemental material). Since we did not dispose of a reliable negative set for the training (i.e., genes not regulated by NCR), we applied the same strategy as that described previously (117) by randomly selecting a set of 123 (3 x 41) genes in the yeast genome. Since the number of variables (63 in total) is greater than the number of genes in the positive training set, we applied forward stepwise selection to select the subset of variables giving the most accurate classification. The efficiency of a classification was estimated using leave-one-out cross-validation. After this phase of training and variable selection, the discriminant function was then applied to each yeast gene to estimate its posterior probability to be NCR sensitive and to assign it to a class (NCR or not NCR). The whole process was repeated 10 times with different negative groups in order to reduce the number of fluctuations due to random selection. A list of 100 genes predicted to be subject to NCR was finally obtained (see Table S3 in the supplemental material).
Nucleotide sequence accession number. The microarray data set has been deposited at the Gene Expression Omnibus resource (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE4861.
| RESULTS |
|---|
|
|
|---|
1278b during growth on a minimal medium containing 21 different single nitrogen sources, including urea used as a reference condition (Table 2). The 21 compounds were selected from a list of 27 nitrogen sources for their ability to support reproducible growth with a generation time of less than 5 h. The reasons for choosing urea as a reference nitrogen source to which the 20 other media were compared for genome expression were that urea catabolism and its regulation are well known (25), the generation time of strain
1278b on urea is near the middle of the studied generation time range (Table 2), and the major regulation systems like NCR and SPS-mediated control do not operate on this medium.
Strain
1278b was selected for this work because it is the reference strain of many previous studies of nitrogen regulation, including those having led to the concepts of NCR (136) and pseudohyphal growth induced by limiting ammonium supply to diploid cells (50). Furthermore, strain
1278b displays the negative regulatory effect exerted by ammonium on the expression of many genes involved in the use of poor nitrogen sources and on the activities of enzymes and permeases encoded by these genes (136). This negative control is less pronounced or even absent in strain S288C (whose genome has been sequenced) and its derivatives (106). This regulation by ammonium is also exerted in a diploid strain obtained by crossing
1278b with S288C, showing that the particular behavior of S288C with respect to ammonium is recessive (85).
The medium was buffered at pH 6.1, and the cells were harvested during exponential growth after at least 10 generations and at low cell density (
106 cells per ml). We could thus consider that changes in medium composition were minimal during cell culture and that cells were harvested in a balanced state of growth. These precautions ensured highly reproducible growth conditions. The mRNA samples were hybridized to microarrays containing 5,690 known or predicted genes defined after reannotation of the yeast genome on the basis of comparative genome analyses of closely related Saccharomyces species (88, 122). Statistical analysis of collected data (see Materials and Methods) enabled us to identify 506 genes displaying significantly different levels of expression on at least one of the test media and on urea (see Table S1 in the supplemental material). This list of 506 genes was compared to multiple predefined gene lists, including those of the Gene Ontology and MIPS functional categories (see Materials and Methods). As expected, it was found to be highly enriched in genes involved in nitrogen and amino acid and/or sulfur metabolism or in genes whose expression is known to be under nitrogen regulation (the complete data set is available at http://dbm.ulb.ac.be/PhysCell/data/Godard.htm). Among the gene lists compared to the 506 nitrogen-regulated genes are those including the stress-responsive genes. The expression of these genes is typically up-regulated (between 200 and 300 genes) or down-regulated (between 400 and 600 genes) in response to a wide range of environmental changes, with this general control being variously named the environmental-stress response (ESR) or the common environment response (CER) (20, 47). For instance, the majority of genes repressed in response to environmental disturbances code for proteins involved in protein biosynthesis, notably, ribosomal proteins (20, 47). Although yeast cells grew at different rates according to the nitrogen source supplied, we observed no significant variation in levels of expression of the protein synthesis machinery genes from one medium to another (P value = 0.99 [see Fig. S4 in the supplemental material]). It thus seems that the ESR/CER is not differentially active on the 21 tested nitrogen media. This result most likely reflects the fact that, in our experiments, the cells were harvested during steady-state growth, without any significant perturbation in the medium.
We next used the transcriptome data set to classify both the nitrogen media and the genes and to identify the main transcriptional control circuits active on each tested medium.
Classification of nitrogen sources.
The gene expression data were used to establish a classification of the 20 tested nitrogen sources versus urea. For this, we started from the 506 genes and discarded those whose expression versus that on urea could not be computed on a significant number of nitrogen sources, e.g., because their expression level was too low or could not be measured. We thus selected 390 of the 506 genes displaying a significant expression level on at least 13 of the 20 media tested. We then applied hierarchical clustering (see Materials and Methods) to their expression values and derived a nitrogen source classification tree based on the average dot product and the complete linkage method (Fig. 2A). We compared this result with those obtained using other metrics and/or tree construction methods. It appeared that two main groups, together containing 14 nitrogen sources, were classified similarly by the majority of techniques applied. Group A (Fig. 2, left part of the tree) includes asparagine, glutamine, and ammonium, known to be good nitrogen sources supporting rapid growth (generation time,
2 h). Interestingly, serine also appears in this group and the corresponding generation time is one of the shortest (generation time, 2 h 13 min). Group A also includes aspartate, alanine, arginine, and glutamate, on which the generation time is below 3 h. It is noteworthy that transamination or deamination of the nitrogenous compounds of group A yields pyruvate or Krebs cycle intermediates (
-ketoglutarate, oxaloacetate), directly assimilable by cell metabolism (Fig. 1). On the media containing nitrogen sources not belonging to group A, the generation time exceeds 3 h. The only exception is the medium containing GABA, which supports fast growth despite significant differences at the level of the yeast transcriptome. Group B (Fig. 2, right part of the tree) includes leucine, isoleucine, methionine, threonine, tryptophan, and tyrosine. In contrast to the group A nitrogen sources and except for leucine, these nitrogen sources support slow growth (generation time, >4 h). Moreover, it is known that the catabolism of these compounds leads to nonmetabolizable products that the cell must excrete and from which fusel oils derive (115, 132, 135). Finally, the classification of the remaining nitrogen sources, namely, valine, phenylalanine, ornithine, proline, GABA, and citrulline, depends strongly on the clustering technique applied. This is why we were unable to associate these compounds with any group on the basis of transcription data. Surprisingly, this classification also indicates that arginine, ornithine, and citrulline are not similar as regards yeast growth or their effects on the transcriptome. Yet all three are urea cycle intermediates degraded largely via the same pathway (136) (Fig. 1). Likewise, among the aromatic amino acids, phenylalanine affects the growth and transcriptome of yeast differently from tryptophan and tyrosine. This is also true of valine, which behaves differently from the other two branched-chain amino acids, leucine and isoleucine.
|
Below, we present the data obtained by hierarchical clustering based on the average dot product metric (see Materials and Methods). The resulting tree was segmented into eight clusters gathering 140, 14, 119, 37, 21, 23, 6, and 30 genes (Fig. 2A and see Table S1 in the supplemental material). These clusters of genes were systematically compared to predefined gene lists (see Materials and Methods; data available at http://dbm.ulb.ac.be/PhysCell/data/Godard.htm). The largest cluster (cluster 1, 140 genes) significantly overlapped the lists of NCR target genes. We thus describe below how the expression of this first group of genes varies according to the nitrogen conditions. The next sections are devoted to the analysis of the other large clusters. The four remaining smaller clusters significantly overlapped with gene lists of more-specific regulations (e.g., genes inducible by GABA or repressed by methionine), which are considered in the second part of Results.
Behavior of known NCR target genes.
We established a list of genes shown by classical molecular biology studies to be sensitive to NCR, considering a gene to be subject to this regulation if its expression level responds to at least one positive transcriptional regulator (GATA transcription factor Gln3 or Gat1) and to at least one negative one (Dal80, Gzf3, or Ure2) related to NCR. For example, the ZRT1 gene was not included in the list because, although its expression depends on the positive factor Gln3, it does not seem to be controlled by any NCR-related negative regulator (29). We thus propose a list of 41 A-NCR target genes (see Table S2 in the supplemental material). We obtained results for only 34 of the 41 A-NCR genes because ASP3-1, ASP3-2, ASP3-3, and ASP3-4 are not present in the genome of strain
1278b (136) and because we were unable to measure any significant expression of the genes ATG14, BAT2, and GDH3 under most conditions tested. Furthermore, two A-NCR genes (VID30 and PEP4) failed to show significant differential expression under our conditions. We thus focused our analysis on the 32 remaining A-NCR genes.
Figure 3A shows that the expression profiles of these 32 A-NCR genes are not all similar. Closer analysis led us to subdivide these genes into two categories. One comprises A-NCR genes which are subject to other transcriptional regulations in addition to NCR. For instance, the UGA1 and UGA4 genes are specifically inducible by GABA (121) and the CAR1 gene by arginine (40). Likewise, DAL4, DAL5, DAL7, DUR1,2, and DUR3 are inducible by allophanate (a product of urea degradation) (25) and PUT1 and PUT2 by proline (18). Other genes in this first category are subject to transcriptional regulations acting on wider sets of genes. This is the case for AGP1, whose expression is induced by extracellular amino acids via the SPS system (63). Likewise, GDH2 is subject to GAAC via transcription factor Gcn4 (57). Yet the expression of several of these inducible genes tends to be lower on nitrogen media that support optimal growth (Fig. 3A), which is consistent with the previous observation that the basal expression of these genes (i.e., monitored in inducer-free media) is subject to NCR.
|
2 h but
2 h 15 min), these nitrogen sources exert a lesser NCR effect than do asparagine, glutamine, and serine. Alanine, arginine, and GABA support slower growth (generation time,
2 h 20 min but
2 h 45 min) than glutamate or aspartate but exert equally strong NCR. When the nitrogen source supports growth at a generation time ranging from 3 to 4 h (this is the case for valine, proline, and phenylalanine), NCR is weaker than on the above-mentioned media, and when the generation time exceeds 4 h, there appears to be no NCR except on ornithine (generation time, 4 h 32 min). Surprisingly, as mentioned above, NCR is minimal on urea, despite the intermediate generation time (3 h 35 min). We can thus generally and qualitatively associate three generation time intervals, <3 h,
3 h but <4 h, and
4 h, with three levels of NCR: strong, weak, and essentially absent, but as shown by the exceptions highlighted above (ammonium, glutamate, aspartate, ornithine, urea), there is no perfect correlation between the growth rate observed with each nitrogen source and the degree of NCR. This point has been discussed in a previous review (24). Towards an exhaustive list of NCR target genes. As mentioned above, hierarchical clustering analysis of gene expression data defined a group of 140 coexpressed genes (cluster 1) (see Table S1 in the supplemental material). These 140 genes include 26 of the 32 A-NCR genes found to be differentially expressed in our study (overlap P value = 4.5 x 1034; sig = 31.6). The six remaining genes were classified in other clusters of coexpressed genes as described below. The group of 140 genes likely includes other NCR target genes in addition to the 26 A-NCR genes. To identify these genes, we first compared the 140 genes with the lists of genes generated by two independent studies aimed at identifying in the whole genome genes controlled by the GATA transcription factors. One of these lists contains 83 yeast genes proposed on the basis of an analysis of ChIP-chip and genome expression dataincluding those reported in reference 116to be associated with at least one of the four GATA transcription factors (7); these 83 genes include 17 of the 41 A-NCR genes (overlap P value = 2.3 x 1022; sig = 19.8). The other list comprises 91 genes whose transcript levels are reported to be positively controlled by the Gln3 and Gat1 GATA factors (112); it includes 28 of the 41 A-NCR genes (overlap P value = 1 x 1044; sig = 42.2). We thus compared our group of 140 genes with those two lists (Fig. 4A). Of the 16 genes found in all three studies, 4 do not appear on the list of A-NCR genes (GLT1, VBA1, DIP5, and YDR090C), nor do 26 of the 38 genes identified by two of the three studies (Fig. 4A). These 30 genes (4 plus 26) are thus highly probable novel NCR target genes (P-NCR) (Fig. 4B). Accordingly, the average expression profile of these P-NCR genes (Fig. 4C) is quite similar to that described for the A-NCR genes (Fig. 3B), although the amplitudes of differential expression according to the nitrogen source are smaller. We thus propose an updated list of 71 (41 plus 30) genes whose expression appears to be controlled by NCR (see Table S2 in the supplemental material).
|
We used qRT-PCR to measure the transcript levels of 20 of the 44 P-NCR genes in the wild type and ura3 gzf3 dal80 mutant grown on glutamine. In this triple mutant strain, the nitrogen catabolite repression exerted on Gln3- and Gat1-dependent transcription is largely relieved (26, 119). We have obtained data for 18 genes, and in all cases, the gene was found reproducibly derepressed in the mutant, confirming their sensitivity to NCR (see Table S4 in the supplemental material). About two-thirds of the 44 P-NCR gene products are known to be involved in nitrogen metabolism, and some of them have previously been shown to be expressed to a lower level under good nitrogen supply conditions (see Table S2 in the supplemental material). Other P-NCR genes code for previously studied proteins which do not seem to be directly associated with nitrogen metabolism. The remaining P-NCR genes and three of the A-NCR genes have not been functionally characterized to date. Different methods of protein sequence comparison were applied to most of these proteins to try to infer a putative function. Among other data provided by this analysis (for details, see Table S2 in the supplemental material), we identified two probable amino acid racemases (YIR030C/DCG1 and YGL196W) and a putative vacuolar amino acid transporter (YDR090C). Furthermore, ORF YIL167W has been annotated as a pseudogene coding for part of a serine/threonine deaminase homolog (Sdl1) in strain S288C, with the adjacent gene YIL168W coding for the second part of this enzyme. We found that both YIL167W/SDL1 and YIL168W emerge as P-NCR genes similarly expressed in strain
1278b on all tested nitrogen sources. In closely related Saccharomyces species (S. paradoxus, S. mikatae, and S. bayanus), these two ORFs are fused and the corresponding gene probably encodes a functional protein (70). Similarly, the NIT1 gene, coding for a protein similar to nitrilases (98), appears to correspond to two adjacent ORFs, YIL164C and YIL165C, which are similarly expressed in our experiments. These two ORFs are also fused in S. paradoxus, S. mikatae, and S. bayanus (70), suggesting that they code for a functional enzyme in these species. We have sequenced the YIL167W/SDL1-YIL168W and NIT1/YIL164C-YIL165C ORFs isolated from strain
1278b used here. We found that, contrary to the situation in strain S288C, neither ORF is interrupted by any stop codon. This strongly suggests that both YIL167-168W/SDL1 and NIT1/YIL164-165C code for functional proteins in strain
1278b. The strain with the latter genotype thus appears suited for functional analysis of these genes. Finally, two other P-NCR genes (LEE1/YPL054W and YOR052C) code for proteins of unknown function that contain predicted zinc finger motifs. These genes are described in more detail in the last section of Results. More details on the primary sequences of all P-NCR gene products are available in Table S2 in the supplemental material.
GAAC.
Among the groups of genes found to be coexpressed on the various test media, cluster 4 and cluster 5 include 37 and 21 genes, respectively (see Table S1 in the supplemental material). Both of these clusters contain a significant number of genes subject to the general control of amino acid biosynthesis (GAAC): 7 (cluster 4) and 5 (cluster 5) are indeed among the 37 GAAC target genes originally defined on the basis of classical molecular studies (57) (respective overlap P values = 1.2 x 109 and 9.9 x 108; sig = 7.13 and 5.20), 16 (cluster 4) and 15 (cluster 5) are on the list of 539 Gcn4 target genes defined on the basis of whole-genome transcript analyses (90) (respective overlap P values = 6.1 x 109 and 1 x 1012; sig = 6.42 and 7.13), and 15 (cluster 4) and 14 (cluster 5) are among the 187 genes found by ChIP-chip analyses (binding P value < 103) to be associated with Gcn4 (53) (respective overlap P values = 3.4 x 1014 and 2.3 x 1017; sig = 10.98 and 14.15). In addition, analysis of the sequences located upstream from the 37 and 21 genes reveals that the 5'-GAGTCA-3' sequence is significantly overrepresented among them (sig = 2.77 and 1.98) (130). This consensus motif corresponds with UASGCRE, the Gcn4 binding site (95). Clusters 4 and 5 thus contain a subset of genes whose expression is controlled by this transcription factor. Interestingly, the genes of these groups are typically expressed to a higher level on leucine, isoleucine, methionine, threonine, tryptophan, and tyrosine than on urea, thus suggesting that GAAC is activated in cells growing on the group B nitrogen sources (Fig. 5). Accordingly, the average expression profiles of genes identified as GAAC targets in two previous studies (57, 90) reveal higher expression levels on the above-mentioned media (see Fig. S5 in the supplemental material). To ascertain the importance of GAAC on group B media, we examined the growth of a gcn2
mutant strain on all 21 nitrogen sources tested in our study. GCN2 is a key gene of GAAC, coding for the eIF2 kinase required to derepress the translation of Gcn4 mRNA (58). We found that the deletion of GCN2 specifically reduces growth when the nitrogen source is a group B compound (Fig. 6). Hence, the GAAC is not only more active but also important for optimal growth on group B media. Yeast on these nitrogen sources is thus characterized by a lack of NCR, slow growth, and a more active GAAC (Fig. 2B).
|
|
|
We consider that other genes in subcluster 3-1 might be SPS targets, displaying an expression profile resembling that of AGP1, GNP1, and MUP1. Three previous studies based on whole-genome microarray hybridization have aimed to identify all yeast genes up-regulated by external amino acids in an SPS-dependent manner (43, 45, 73). In another study, the binding sites for the Stp1 and Stp2 transcription factors were mapped on the whole-genome scale in ChIP-chip experiments (53). Interestingly, although these reports propose lists comprising between 22 and 72 target genes, the overlap between them is mostly limited to the few amino acid permease genes mentioned above. The same applies to the overlap between these lists and the 15 genes of subcluster 3-1. This suggests that the number of genes up-regulated by amino acids via the SPS system might be more limited than generally thought, possibly mainly to amino acid permease genes.
Influence of nitrogen on the UPR regulon. A second subcluster (subcluster 3-2) comprises 45 genes which are expressed to a higher level on media containing a good nitrogen source (asparagine, glutamine, serine, ammonium, aspartate, arginine, or glutamate) than on media containing a poor one (Fig. 7B). The expression profile of these genes is thus opposite to that of the NCR target genes. Among these 45 genes, 21 belong to the list of 381 targets of the UPR as defined by a previous whole-genome transcript analysis (125) (overlap P value = 1.4 x 1014; sig = 12). KAR2, a well-known UPR target (93, 108) even though it does not appear in this list of 381 genes, also belongs to subcluster 3-2. The expression of UPR target genes typically increases when incorrectly folded proteins accumulate in the secretory pathway. This response can also be induced by exposing cells to dithiothreitol, a powerful reducing agent preventing the formation of disulfide bridges, or to tunicamycin, an inhibitor of N-glycosylation reactions. The transcription factor Hac1 and its regulator Ire2 are the two main protein mediators of the UPR (83, 99, 120). Using yMGV (the yeast Microarray Global Viewer) (77, 89), we compared the average expression profile of the 45 genes of subcluster 3-2 with that of the 85 above-defined NCR target genes in a wide set of whole-genome transcript analyses. We found these two gene groups to display opposite expression profiles, notably in two series of experiments aimed at inventorying stress-responsive genes (47). In this study, the NCR target genes were activated under nitrogen starvation and in the stationary phase of growth, while the 45 genes of subcluster 3-2 showed reduced expression under these conditions (see Fig. S6 in the supplemental material).
Influence of urea on the ZAP regulon. Subcluster 3-3 comprises 42 genes showing lower expression on urea than on any other nitrogen source (Fig. 7C). Surprisingly we found among these genes 16 of the 46 established targets of the Zap1 transcription factor (overlap P value = 5.1 x 1025; sig = 22.5). Expression of these 46 genes is activated by the Zap1 transcription factor in zinc-deprived cells (82). Interestingly, most of the 26 other genes of subcluster 3-2 also showed Zap1-dependent derepression (82), even though they lack the binding site for this factor in their respective promoter regions. We measured by qRT-PCR the expression of two known Zap1 target genes, YGL258W and ZAP1 itself, on urea and valine media. The results confirm that both genes are expressed to a lower level on urea than on valine (see Fig. S7 in the supplemental material). We also increased the zinc ion concentration from 0.05 µM (the Zn2+ concentration in the minimal medium used in our experiments) to 2.5 µM (the Zn2+ concentration in standard yeast nitrogen base medium); under these conditions, neither of the two genes is expressed to a significant level on either urea or valine medium (data not shown). The Zap1 regulon is thus active under our growth conditions but less so when urea is the sole nitrogen source. Yet we observed no growth defect on media containing as little as 0.05 µM Zn2+ (data not shown), indicating that the zinc ion concentration in the buffered minimal medium used in our experiments is not growth limiting. We do not understand the effect exerted by urea on the expression of Zap1-controlled genes. Other studies have likewise revealed links between nitrogen metabolism and cellular zinc ion homeostasis. On the one hand, expression of the Zap1 target gene ZRT1, coding for a plasma membrane zinc transporter, depends on the GATA transcription factor Gln3 (29). The expression of two other Zap1 target genes, ZRT2 and ZRC1, coding for two zinc transporters located, respectively, at the plasma membrane and the vacuolar membrane (44), is also subject to this control (112). Yet the relationship between zinc ion homeostasis and nitrogen metabolism remains unclear.
Genes most highly expressed on methionine and threonine. Subcluster 3-4 comprises 17 genes showing higher expression on all media tested than on urea and their highest expression on methionine and threonine (Fig. 7D). Interestingly, eight of these genes encode proteins displaying a mitochondrial localization, according to annotations in the SGD (42).
Transcriptional regulations active on a single or a few nitrogen sources. So far we have focused our analysis on 390 genes which are differentially expressed according to the nitrogen source, classifying them by clustering analysis. This has enabled us to identify several transcriptional regulations acting on large sets of genes and the nitrogen supply conditions under which these global controls operate (see Discussion). We next analyzed in detail the 506 genes over- or underexpressed on any medium compared to their expression on urea. For this, we again applied the hierarchical clustering technique, this time to the groups of genes which are up- or down-regulated on any of the 20 nitrogen sources versus urea. For several nitrogen sources such as asparagine, glutamine, ammonium, aspartate, alanine, and valine, the identified genes appear on lists of targets of the above-described transcriptional controls, e.g., NCR. Hence, none of these sources induced any detectable specific transcriptional regulation. For other nitrogen sources, there emerged groups of coregulated genes that were not identified in the above analysis and showed activated or repressed expression on only a limited number of nitrogen sources. These genes, described below, are targets of more-specific transcriptional regulations triggered by the nitrogen sources concerned.
Transcriptional responses triggered by aromatic amino acids. Five genes show higher expression on phenylalanine, tryptophan, and tyrosine than on urea (see Fig. S8 in the supplemental material). As expected, these include ARO9 and ARO10, coding, respectively, for the transaminase and the decarboxylase involved in degrading these amino acids (62, 133) and known to be induced by aromatic amino acids via the Aro80 transcription factor (62). Among the three other genes displaying a similar expression profile, ESBP6 encodes a mitochondrial protein sharing sequence similarities with monocarboxylic acid transporters, but its function remains unknown (87). Another such gene is YDR379C-A, a gene adjacent to ARO10/YDR380W and separated from it by 701 bp. YDR379C-A and ARO10/YDR380W are divergent and thus share the same promoter region. The function of the YDR379C-A gene is unknown, and the sequence of its product is not indicative of any particular function. Lastly, ARO80 itself is expressed to a higher level on tryptophan than on urea, suggesting that Aro80 also controls the expression of its own gene. The promoter regions of these five genes have all been found to associate with the Aro80 transcription factor (53). In accordance with the view that Aro80 acts on ESBP6 and ARO80 in addition to ARO9 and ARO10, the upstream noncoding regions of these genes contain the cis-regulatory ARO upstream activation sequence through which Aro80 activates gene transcription (62). To verify that the ESBP6 and ARO80 genes are regulated by the Aro80 transcription factor, we used qRT-PCR to measure their transcript levels and that of ARO9 on urea medium with or without added tryptophan, tyrosine, or phenylalanine. The strains tested were a wild-type strain and an aro80 mutant (Fig. 8A). In these experiments, the Aro80-dependent induction of ARO9 and ESBP6 expression was observed in the presence of each aromatic amino acid. The same is true of ARO80, but the effect was much less pronounced. We also compared the growth of a strain lacking ESBP6 with that of the corresponding wild type in all media tested in this work but found no difference (data not shown).
|
Transcriptional responses triggered by methionine, leucine, and isoleucine. Surprisingly, the five Aro80 target genes also show higher expression on methionine, leucine, isoleucine, and threonine than on urea (see Fig. S8 in the supplemental material). As the catabolism of these amino acids remains only partially characterized, our observation raises the interesting possibility that the genes of the ARO regulon might also be involved in the catabolism of these amino acids.
Eight other genes showed lower expression on methionine than on any other tested medium. Six of them are among the eight target genes of the Cbf1-Met4-Met28 transcription complex mediating the repression of transcription in response to methionine (124), and the two others are also known to be repressed by methionine (52, 123). The two remaining Cbf1-Met4-Met28 target genes (MET10 and MET2) are also expressed to a lower level on methionine medium in our study but are not among the 506 genes displaying significant variation of expression on one or more tested nitrogen sources from that on urea.
Transcriptional responses triggered by arginine. The CAR1 and CAR2 genes, encoding arginase and ornithine transaminase, respectively, are up-regulated on arginine, whereas the ARG1, ARG3, ARG4, and ARG8 genes, encoding enzymes involved in arginine anabolism, are repressed on arginine medium. This is in accordance with the mechanisms of mutual exclusion of anabolism and catabolism of this amino acid (39). No other gene displaying arginine-dependent variation of expression was identified. The CAR1 and CAR2 genes are also highly expressed on citrulline and required for citrulline catabolism. Surprisingly, CAR1 is also more highly expressed on ornithine, isoleucine, leucine, methionine, threonine, tryptophan, and tyrosine even though arginase is not involved in the catabolism of any of these nitrogen sources and despite the fact that the arginine biosynthesis genes are GAAC activated only on group B nitrogen sources. A positive effect of these compounds on CAR1 expression has been observed previously, under conditions of nitrogen repression (41). The mechanism causing the higher expression of CAR1 under these conditions remains unknown.
Transcriptional responses triggered by proline, citrulline, and glutamate.
The PUT1 and PUT2 genes (encoding, respectively, proline oxidase and
-1-pyrroline-5-carboxylate dehydrogenase) are involved in proline catabolism and are targets of the Put3 transcription factor activated by intracellular proline (18, 35, 36, 60). As expected, both genes show higher expression on proline than on the other tested media. They also show higher expression on citrulline, and increased PUT2 expression is also observed on arginine and ornithine. A recent work has shown that the MCH5 gene encoding a riboflavin transporter (105) is also induced by proline in a Put3-dependent manner (Juergen Stolz, personal communication). Accordingly, MCH5 is more highly expressed on proline medium and also on citrulline, arginine, and ornithine. That citrulline, ornithine, and arginine have a positive effect on the expression of the PUT regulon is likely due to the fact that the catabolism of these amino acids leads to the formation of
-1-pyrroline-5-carboxylate, which is converted into proline (19). Apart from the PUT and MCH5 genes, no other gene displaying significant proline-dependent variation of expression was found.
CIT2 and DLD3 (encoding, respectively, a citrate synthase and a D-lactate dehydrogenase) show lower expression on glutamate, proline, and citrulline than on the other nitrogen sources. These two genes are subject to the retrograde (RTG) control (22, 79). This regulation is mediated by the Rtg1 and Rgt3 transcription factors, which are inhibited by glutamate and responsible for the expression of genes encoding enzymes involved in
-ketoglutarate synthesis. Other known RTG targets, i.e., CIT1 (citrate synthase), IDH1, IDH2 (isocitrate dehydrogenase), and ACO1 (aconitase), show the same profile of inhibition by glutamate, but their P value does not exceed the threshold set for selecting differentially expressed genes.
Transcriptional responses triggered by GABA.
Five genes show higher expression on GABA medium than on the other media (see Fig. S10 in the supplemental material). Among them, the genes UGA1 (GABA transaminase), UGA2 (succinate semialdehyde dehydrogenase), and UGA4 (GABA permease) are known to be induced specifically by GABA, via the Uga3 and Uga35/Dal81 transcription factors (3, 104, 121). The other two genes, AMD2 and MAE1, are thus new potential targets of this regulation. To test this hypothesis, we used qRT-PCR to measure their expression and that of UGA4 used as a control in a wild-type strain and a uga3
mutant growing on urea with or without added GABA (Fig. 8B). We observed Uga3-dependent induction by GABA of the expression of both genes. AMD2 codes for a putative amidase (21), and MAE1 encodes a mitochondrial malic enzyme catalyzing the oxidative decarboxylation of malate to pyruvate (16). It is known that the uga1 and uga2 mutants are unable to grow on GABA as the sole nitrogen source, and the same is true of a uga4 mutant if the strain lacks the two other GABA permeases (Gap1 and Put4) (3, 104). We deleted the AMD2 and MAE1 genes in the
1278b strain, but the resulting mutants displayed no growth defect on any of the tested media, including GABA medium (data not shown).
Transcriptional responses triggered by serine and threonine. Two genes show much higher expression on serine and threonine than on the other nitrogen sources (see Fig. S11 in the supplemental material). The first, CHA1, encodes a serine/threonine deaminase and is required for growth when the nitrogen source is one of these amino acids (101). Expression of this gene is known to be induced by threonine and serine via the Cha4 transcription factor (59). The second gene, MMF1, encodes a mitochondrial protein of unknown function. This protein is required to maintain the mitochondrial genome and for isoleucine biosynthesis (71, 97). Orthologs of MMF1 are found in all domains of life. The ortholog in Escherichia coli, tdcF, belongs to the tdcABCDEFG operon. Interestingly, several genes of this operon are involved in serine and threonine catabolism in this organism (54). This suggests that MMF1 is involved in serine and/or threonine degradation in yeast. Yeast also possesses a paralog of this gene,