Differential and Inefficient Splicing of a Broadly Expressed Drosophila erect wing Transcript Results in Tissue-Specific Enrichment of the Vital EWG Protein Isoform

ABSTRACT In this report, we document an unusual mode of tissue-enriched gene expression that is primarily mediated by alternative and inefficient splicing. We have analyzed posttranscriptional regulation of theDrosophila erect wing gene, which provides a vital neuronal function and is essential for the formation of certain muscles. Its predominant protein product, the 116-kDa EWG protein, a putative transcriptional regulator, can provide all known erect wing-associated functions. Moreover, consistent with its function, the 116-kDa protein is highly enriched in neurons and is also observed transiently in migrating myoblasts. In contrast to the protein distribution, we observed that erect wing transcripts are present in comparable levels in neuron-enriched heads and neuron-poor bodies of adult Drosophila. Our analyses shows thaterect wing transcript consists of 10 exons and is alternatively spliced and that a subset of introns are inefficiently spliced. We also show that the 116-kDa EWG protein-encoding splice isoform is head enriched. In contrast, bodies have lower levels of transcripts that can encode the 116-kDa protein and greater amounts of unprocessed erect wing RNA. Thus, the enrichment of the 116-kDa protein in heads is ensured by tissue-specific alternative and inefficient splicing and not by transcriptional regulation. Furthermore, this regulation is biologically important, as an increased level of the 116-kDa protein outside the nervous system is lethal.

Most eukaryotic primary RNA transcripts undergo posttranscriptional processing requiring splicing of introns. The bestappreciated regulatory outcome of posttranscriptional processing is alternatively spliced transcripts that differ in the coding exons or have distinct 3Ј or 5Ј untranslated ends (reviewed in reference 32). A second consequence of posttranscriptional regulation is the modulation of amounts of specific transcripts dependent on differential splicing efficiencies of different splice sites. It is generally thought that differences in cell-type-specific splicing machineries result in cell type-enriched or -specific alternative splicing (5,19). In addition, efficiency of splicing could play a major role in gene regulation as primary transcripts that are not completely processed are generally not transported to the cytoplasm and are unlikely to code functional proteins (6,21,23). We decided to investigate the role of alternative and inefficient splicing in the regulation of the Drosophila erect wing (ewg) gene, as previous studies indicated a complex transcript profile, intron-containing cDNAs, as well as poly(A) ϩ transcripts with retained introns (9).
The Drosophila ewg gene provides a function that is vital in the nervous system and essential to the development of certain muscles (16). EWG protein contains an unusual DNA binding domain that is homologous to sea urchin P3A2 protein (4,10), zebrafish Nrf (3), and mammalian transcription factors NRF-1 and initiation binding receptor (13,18,31). Our previous studies suggested that ewg primary transcript may be alternatively spliced, since the ewg gene has several introns and its Northern pattern shows multiple transcripts that are tissue and developmental stage modulated (15). However, at the protein level, only one major polypeptide, a 116-kDa, 733-amino-acid-long polypeptide encoded by the SC3 cDNA open reading frame (ORF), was observed in immunoblot analysis, although many other cross-reacting bands were also observed (9,10). The translation start site of the SC3 ORF is an unconventional CTG codon, suggesting that translational regulation of ewg may be an important aspect of ewg regulation (10). Transgenes expressing the 116-kDa EWG protein provide compelling evidence that the 116-kDa protein is the major functional protein, as expression of 116-kDa protein in the neurons rescues lethality and general expression rescues both lethal and muscle phenotypes associated with ewg alleles (8,10). An antibody generated against the 116-kDa EWG protein selectively labels all neurons in the embryonic and larval stages and certain migrating myoblasts in early pupae (8)(9)(10), suggesting a distinct tissue-specific expression of the protein and possibly transcript.
We investigated the splicing patterns of ewg RNA to address if ewg transcripts are indeed alternatively or inefficiently spliced and if the pattern of splicing shows tissue-specific differences. In this paper, we report the results on ewg splicing, using reverse transcription (RT)-PCR in head and body RNAs as representative of neuron-enriched and neuron-poor tissue, respectively. Our results show the following. (i) ewg is more widely transcribed than previously recognized, and total ewg RNA levels in heads and bodies are comparable. (ii) A subset of ewg introns are efficiently spliced, but another subset are inefficiently spliced and retained in poly(A) ϩ RNA. (iii) ewg RNA in bodies has a greater representation of unprocessed RNAs, and RNAs that include two exons that are not part of the SC3 ORF. One of these new exons is not included in ewg transcripts present in heads. (iv) SC3 ORF RNA is enriched in adult heads but low in the bodies. (v) Modest expression of the SC3-encoded ORF in the body can be lethal. Thus, ewg, which is widely transcribed, is primarily regulated by posttranscriptional mechanisms.

MATERIALS AND METHODS
Fly stocks and genetic crosses. Drosophila melanogaster flies were raised on standard media and at 25°C. The Canton-S strain was used as the wild type. EWG HS and EWG NS are two white ϩ marked transgenes that encode the 116-kDa EWG protein isoform under the control of heat shock hsp promoter and the neuron-specific elav promoter (8). ewg l1 is a lethal, protein-null allele of the ewg gene (10). Df(1)cin-arth: uncovers several loci inclusive of cin, ewg, and y (14). Dp243: is a free duplication derived from Dp1187 which is y ϩ and also carries a P element marked with rosy (35).
Crosses to check rescue of ewg deletion by transgene rescue consisted of crossing females of genotype Df(1)cin-arth w a v Of f/FM7a; EWG NS to y; ry; Dp243 y ϩ ,ry ϩ males; Df(1)cin-arth w a v Of f/Y; Dp243 y ϩ males have a synthetic deletion of the ewg locus. Males of the genotype Df(1)cin-arth y w a v Of f/Y; EWG NS /ϩ; Dp243 y ϩ ry ϩ were found at the expected frequency, while flies without EWG NS4 did not survive.
RT-PCR. Total RNA was isolated using Trizol reagent (GIBCO-BRL) from heads and bodies of 2-day-old adults. After DNase I (GIBCO-BRL) treatment, RT of 1 g of total RNA was primed with an oligo(dT) or gene-specific probe, using a Superscript II cDNA synthesis kit (GIBCO-BRL) according to the manufacturer's instructions except that the RNA was kept at 50°C for 5 min before initiation of the RT reaction. The manufacturer's instructions were followed to synthesize cDNAs primed by random hexamers. The RNase H step was omitted. Controls were done with no RNA and no reverse transcriptase. The sequence of the gene-specific probe is 5Ј-ACACTGTTCCATCGCTGTTCGT-3Ј, which hybridizes to exon H. Cycle parameters for the PCRs were 30 s at 95°C, 40 s at 60°C, and 45 s at 72°C for 30 cycles, with an initial 2 min at 95°C and a final 8-min extension at 72°C. All PCRs were carried out in 50 l, 8 l of which was loaded on agarose gels. The Mg 2ϩ concentration was optimized for each primer pair. Taq polymerase was from GIBCO-BRL, and PCR conditions were according to their instructions. Primers were used at a final concentration of 4 ng/l. Primer positions are outlined in Fig. 1B, and their sequences are shown in Table 1. cDNA and genomic sequences were used for primer design. Primer sequences are shown in Table 1, and positions are outlined in Fig. 1B. Primer3, a web-based software program by Rozen and Skaletsky (27a), was used to assist in primer design. Identity of PCR bands was determined by restriction digests, internal primer PCRs, and/or direct sequencing. Direct sequencing was done with ABI automated sequencing equipment.
Sequencing of ewg l1 . ewg l1 y w sn embryos were collected 24 h after egg laying and selected by the y marker after dechorionation. DNA was extracted by homogenizing ϳ50 embryos in 100 mM NaCl-10 mM Tris-HCl (pH 7.5)-1 mM EDTA. The homogenate was incubated in 1% sodium dodecyl sulfate (SDS)-1 mg of proteinase K per ml for 16 h at 55°C, phenol-chloroform extracted, and precipitated. A 2.4-kb genomic fragment spanning exons B to D was amplified by PCR with primers In1F and In3cR, using Pwo polymerase (Boehringer Mannheim). This fragment was then reamplified by using primers In1F/In2R and 2F1/e4R and sequenced on both strands. The sequence was compared to both cDNA and genomic DNA sequences, which match.
Protein expression. EWG protein was divided into three fragments for expression: exons B and C (EH1), exon D (EH2), and exons F to H containing J (EM3). Additionally, we made two fragments containing overlapping parts: exons B to D (EH4) and exons D to H containing J (EH5). Fragments were amplified from the SC3 cDNA, using the following primers containing an XbaI site in the return primer for cloning: 5Ј-CTGGCCACCACAAGCTATC-3Ј (e23F) and 5Ј-GCTC TAGATCAGTTATTGCTGTTGCCCGTC-3Ј (e23R) for EH1, 5Ј-CAACCGC AGCAGGTGAAT-3Ј (e4F) and 5Ј-GCTCTAGATCAATCAACATCGCTGA GCGTAA-3Ј (e4R) for EH2, and 5Ј-TACACCACGCAAACGGTC-3Ј (e6-8F) and 5Ј-GCTCTAGATCAGCTCCAGCTATTGTTCCAT-3Ј (e10R) for EM3. EM3 was cloned into pMal-c2 (New England Biolabs), using the XmnI and XbaI sites in pMal, yielding an N-terminal fusion to maltose binding protein (MBP). The remaining fragments were cloned into pSG05 via the SnaBI and NheI sites (17), yielding an N-terminal His tag, since we were unable to clone these fragments with the pMal system. Protein expression was done as described previously (17) for the EH fusions and according to the manufacturer's instructions for the EM fusions.
Immunoblot analysis. Drosophila protein extracts were prepared and resolved as previously described (30). Drosophila embryos were 14 to 18 h old. Bacterial extracts were prepared by pelleting the bacteria and dissolving them in 2ϫ sample buffer. Proteins were resolved by SDS-polyacrylamide gel electrophoresis at 12.5% and 8% SDS for the bacterial extract and Drosophila extracts respectively. Anti-EWG antibody (10) was used 1:5,000 for immunoblot analysis of bacterial extracts and 1:1,500 for Drosophila extracts. A peroxidase-conjugated goat anti-rabbit secondary antibody (Amersham) was used at a dilution of 1:2,000, and blots were developed by chemiluminescence (LumiGLO; Kirkegaard & Perry).
Nucleotide sequence accession numbers. The cDNA sequence (accession no. L11345) and genomic sequence (accession no. AF135590) have been submitted to GenBank. Figure 1A represents our current understanding of the exon/intron structure of the ewg gene. This map is based on sequences of ewg genomic DNA (15), two cDNAs, SC3 and MPA-1 (9, 10), that shared a common ORF, referred to as the SC3 ORF, consisting of exons B, C, D, F, G, H, and J, and the RT-PCR analysis presented in this paper (see below). The SC3 cDNA differed from MPA-1 in that it contained part of intron 1 and lacked the noncoding exon A (Fig. 1A).

Splicing of known ewg exons differs in adult heads and bodies.
To characterize the splicing of ewg RNA, we used head and body RNAs, since differences between these tissues based on Northern patterns were expected (15). The splicing profile of ewg was determined by RT-PCR analysis of oligo(dT)-primed cDNAs and exon-specific primers in exons A, B, C, D, F, G, H, and J. Figure 1B shows the locations of primer pairs, and Table  2 summarizes the expected sizes for spliced and nonspliced products. Table 3 summarizes the PCR products detected by using exon-specific primers.
(i) Introns 2, 4, and 5. Analysis of RT-PCR products using exon-specific primers revealed that introns 2, 4, and 5 are excised efficiently, as in each case a single band representing the spliced product was observed for head or body RNA. Moreover, for each primer set the band densities in both head and body lanes were comparable (Fig. 2, lanes 3, 4, 19, 20, 12, and 13). The efficient splicing of these introns, which are all small (Table 4), is to be expected since small introns are known to be spliced efficiently in Drosophila (29). However, the observation that the PCR bands for both body and head RNAs were comparable was unexpected, as it implied that nonneural tissues expressed ewg RNA to a greater extent than previously thought (see below and Fig. 3). In these RNA samples, levels of a control ribosomal protein transcript, rp49, are similar in the two tissues ( Fig. 2, lanes 17 and 18).
(ii) Intron 1. Intron 1 is more efficiently spliced in body than in head RNA (Fig. 2, lanes 1 and 2). This inefficient splicing in heads perhaps explains the SC3 cDNA, which contains part of intron 1 (Fig. 1A).
(iii) Introns 3a, 3b, and 3c. Intron 3a is inefficiently spliced in both heads and bodies, as both spliced (150-bp lower band) and unspliced (449-bp upper band) products are observed in rp49  Fig. 2 and 4D. Abundance was assessed as ϩϩϩ, ϩϩ, ϩ, ϩ 1/2 , or ND (not detected) for each head and body pair by visual inspection; the strongest band of each primer pair was arbitrarily assigned a value of ϩϩϩ. In some instances, bands designated ϩ 1/2 might not be discernible in the figure. In, intron; Ex, exon; ret, retained; spl, spliced.
RT-PCRs using primers 3aF and 3aR with poly(A) ϩ head and body RNAs (Fig. 2, lanes 5 and 6). The excision of intron 3c (154-bp lower band) occurs more efficiently in adult heads than in bodies (lanes 9 and 10), as the body lane shows the unspliced product (highest, 961-bp band in lane 9), while it is undetectable in heads under these assay conditions (lanes 9 and 10). Two unexplained bands were present predominantly in the body lane in RT-PCRs using primers 3cF and 3cR (lanes 9 and 10), representing a new exon (see below). Thus, both introns 3a and 3c are retained in a fraction of body transcripts, while 3a is also retained in a fraction of head RNAs.
Since both introns 3a and 3c are inefficiently spliced, we wondered if splicing events that exclude exon D also occur, as was previously indicated by a partial cDNA, SC1 (9). Primer 4C, 9 and 10 ExA-D,F,G (2,310) 4B, 3 and 4 ExB-D,F (933) ϩϩϩ ϩ 1/2 4C, 6 and 7 ExB-F (1,007) ND ND ϩ a Summary of RT-PCR data from Fig. 3A and 4A to C. Transcripts with SC3-like splicing are underlined. Abundance was assessed as ϩϩϩ, ϩϩ, ϩ, ϩ 1/2 , or ND (not detected) for each head and body pair by visual inspection; the strongest band of each primer pair was arbitrarily assigned a value of ϩϩϩ. In some instances, bands designated ϩ 1/2 might not be discernible in the figure. In, intron; Ex, exon.  Table  2). Using a bridge primer that hybridizes to both exon C and F, we confirmed that the exclusion of exon D occurred at equal levels in heads and bodies (data not shown). Further, a band representing transcripts where both introns 3a and 3c are spliced (637 bp) is highly enriched in heads. Again two additional bands were present almost exclusively in the body lane due to presence of a new exon.
(iv) Intron 6. Using primer set 6F/6R and head RNA, only one band (278 bp) expected from splicing of intron 6 was seen. However, in the body RNA lane, two relatively faint bands of equal intensities were observed; the lower band represents the splicing of intron 6, while a slightly larger and unexpected band represents a second new exon (Fig. 2, lanes 14 and 15; see below). The low level of spliced product in the body lane suggests that intron 6 is inefficiently spliced and/or that some transcripts terminate within it.
In summary, the results indicate the following. (i) The spliced products resulting from the excision of introns 2, 4, and 5 are present at similar levels in head and body, implying that ewg RNA is expressed outside the nervous system, likely in many tissues. Further, the splicing of these introns is unlikely to be regulated in neurons, as no significant differences are detected between neuron-enriched heads and neuron-poor bodies. (ii) The excision of introns 3c and 6 takes place at a higher efficiency in heads, which results in higher levels of mRNAs that encode the 116-kDa EWG protein (SC3 ORF) in adult heads than in bodies (Fig. 1A). This suggests that the splicing of introns 3c and 6 is likely to be regulated in neurons. (iii) Intron 3a and 3c are retained in a fraction of polyadenylated ewg RNAs, demonstrating that these introns are not spliced efficiently. (iv) ewg RNA undergoes alternative splicing in both heads and bodies by excluding exon D. (v) Levels of intron 3b splicing are similar in heads and bodies. (vi) Bodyenriched novel PCR bands were detected in the region of introns 3c and 6. That body tissue is representative of neuronpoor splicing events was supported by the identical splicing profile of ewg in abdomen RNA, which is more neuron poor than that of adult bodies, which contain the thoracic and abdominal ganglia (data not shown).
Characterization of new ewg exons E and I. The novel bands were isolated and directly sequenced on both strands with the primers that had been used for their amplification to determine if they resulted from additional exons in the ewg gene. The sequence of the 230-to 250-bp product detected in the 3aF/3cR (Fig. 2, lane 7) and 3cF/3cR (lane 9) PCRs revealed the presence of a 74-bp exon in intron 3c. This new exon, E in Fig. 1A, codes for 24 amino acids and alters the translational frame. Further RT-PCR analysis of exon E revealed high enrichment in female abdomens (data not shown).
Sequencing of the upper band amplified from body RNA with primers 6F and 6R revealed the presence of a 38-bp exon (Fig. 1D), exon I in Fig. 1A, present within intron 6. Exon I is exclusive to ewg transcripts in bodies, encodes 12 amino acids, and also alters the translational frame (Fig. 2, lane 14; see also Fig. 4B, lanes 15 and 17).
The 5Ј and 3Ј splice sites for the new exons matched the splice site consensus (Table 4) (27). Moreover, the flanking introns have sequences that match candidate branch point sequences at appropriate distances from the relevant 3Ј splice FIG. 2. Characterization and comparison of ewg splicing in wild-type adult tissues. All RT-PCR assays were carried out with DNase I-treated total RNA isolated from 2-day-old heads (H) or bodies (B). The italicized letters below each pair of lanes represent the specific splice events as outlined in Fig. 1B, e.g., primers 3aF and 3aR for intron 3a splicing. These data are summarized in Table  2, which also lists the lengths of PCR products. rp49 transcripts were used as a control. Molecular size markers (GIBCO-BRL) are shown in lanes 11 and 16. Note that splicing of introns 3a, 3c, and 6 in heads is mostly in the mode of the SC3 cDNA.  (26) were used to determine highly divergent splice sites, i.e., splice sites that have at least 40% of the nucleotides that occur at a frequency less than 20% or nearly 50% of the nucleotides that occur at a frequency less than 30%. Upper-and lowercase letters represent exon and intron sequences, respectively. site (26). Thus, exons E and I fit the criteria of authentic exons capable of being spliced appropriately. Both of these new exons match the genomic sequence except for one nucleotide substitution in exon E (Fig. 1C).
ewg RNA is abundant in heads and bodies. Comparison of levels of efficiently spliced introns 2, 4, and 5 ( Fig. 2A, lanes 3,  4, 12, 13, 19, and 20) suggests that ewg RNA is present in heads and bodies at comparable levels. To verify that ewg RNA was indeed present at high levels in bodies, either random hexamers or a gene-specific primer in exon H were used for RT, allowing the amplification of all splice isoforms of ewg regardless of their state of polyadenylation in subsequent PCR. For both reactions, primers 4F and 5R yielded very similar signals with head and body RNAs (Fig. 3A, lanes 1, 3, 6, and 8). Primers 3cF and 5R revealed that both tissues contain ewg RNAs that either retain or excise intron 3c (lanes 2, 4, 7, and 9). Exon E-containing transcripts were detected mostly in bodies (lanes 2 and 7). Also, some intron 3c-1 retention in body RNA is observed (lanes 2 and 7). Thus, ewg RNAs appear to be efficiently polyadenylated, as the ewg splicing profiles are sim-ilar for cDNAs primed with oligo(dT), gene-specific, and random hexamer primers.
To verify that PCR amplification is in the linear range, the accumulation of the spliced product was assessed every two cycles from cDNAs primed with a gene-specific primer in exon H. Comparable signals were obtained in head and body lanes throughout the linear range of amplification using primers 4F and 5R (Fig. 3B and C; Fig. 1). Thus, ewg transcript is expressed in bodies and heads at similar levels.
Alternative splicing of ewg exons. The RT-PCR studies suggested that both head and body RNAs have populations of alternatively spliced transcripts that include or exclude exon D and that body-specific transcripts are enriched in transcripts that include exon I. Further, the splicing of introns 3a and 3c and splicing of intron 6 in bodies appear to be independent of each other since ewg transcripts containing exon I were found with or without exon D (436 bp in Fig. 4A, lane 6; 451 bp in lane 7; summarized in Table 3).
To further support these data and to test whether no other exons are alternatively spliced, PCR was done with primer pairs spanning several exons. No further alternatively spliced ewg transcripts were detected ( Fig. 4B and C). Thus, alternative splicing in ewg transcripts is restricted to introns 3a, 3c, and 6.
ewg is inefficiently spliced. To determine if introns other than 3a and 3c were present in poly(A) ϩ ewg RNA, PCR was done with intron-specific primers. Introns 1, 3a, 3c, and 6 ( Fig.  4D) are present in poly(A) ϩ ewg transcripts of heads and bodies. From these introns, only intron 1 is differentially retained in body RNA compared to in head RNA. All of these introns are larger than 81 bp and are classified as large Drosophila introns (26). Among these introns, the 5Ј and 3Ј splice sites of 3c and the 3Ј splice site of 3a diverge significantly from the Drosophila consensus (Table 4) (26). Introns 2, 4, and 5 were not detected in poly(A) ϩ ewg transcripts, confirming that they were efficiently spliced (Fig. 4D, lanes 3, 4, 10, 11, 12, and 14; Table 3). The overall levels of unspliced ewg transcripts were assessed in assays using primer 3cF and return primer RV in intron 6. This analysis indicated the presence of greater amounts of unprocessed ewg transcripts containing both intron 3c and intron 6 in bodies than in heads (Fig. 4A, lanes 9 and 10; data summarized in Table 3).
The splicing situation in the region of intron 6 is complex, as evidenced by two spliced transcripts that exclude or include exon I and RNAs that retain intron 6, and possibly transcripts that terminate in intron 6, most of which show differential distribution in heads and bodies. First, intron 6 is spliced more efficiently in head RNA than in body RNA, where overall splicing appears to be significantly reduced (Fig. 4D, lanes 15  and 16). Second, about half of the spliced body transcripts show inclusion of exon I (Fig. 2, lanes 14 and 15; Fig. 4B, lanes  14 to 17). Finally, RNAs that retain proximal intron 6 sequences are more prevalent in body RNA, as seen by use of the return primer RV, 5Ј to the polyadenylation sites in intron 6 ( Fig. 4A, lanes 1, 2, 9, and 10; Fig. 4C, lanes 2 to 5). In contrast, when the return primer in exon J is used, many fewer products are amplified in bodies than in heads (Fig. 4A, 1 to 4; Fig. 4B,  lanes 14 to 17). This result suggests that in bodies, exon Jcontaining RNA is underrepresented compared to RNA containing the 3Ј region of the intron 6. Thus, some RNAs in the body may be terminated before exon J, using the putative cleavage/polyadenylation sites in intron 6 (nucleotides 5876 and 6742, AATAAA). The presence of such transcripts was previously suggested by a partial cDNA, SC1, that in addition to excluding exon D also retained part of intron 6 (9).

FIG. 3. Abundance of ewg transcripts is independent of polyadenylation and is equal in heads (H) and bodies (B). (A)
RT-PCR using random-primed or gene-specific-primed cDNAs. The RT reaction shown in lanes 1 to 4 was primed with a primer in exon H, an exon common to all ewg transcripts. The RT reaction shown in lanes 6 to 9 was primed with random hexamers. The italicized letters below each pair of lanes represent the specific splice events assayed as outlined in Fig. 1B. The uppermost bands in lanes 2, 4, 7, and 9 show ewg transcripts containing intron 3c. The 408-bp band in lanes 2 and 7 contains exon E. Molecular size markers (GIBCO-BRL) are shown in lane 5. (B and C) Cycle titration of PCRs using primers 4F and 5R to amplify parts of ewg transcripts common to all ewg transcripts. cDNAs were synthesized with a gene-specific primer in exon H. Aliquots were removed from the PCR beginning at cycle 18 and continuing until cycle 28. Quantitation of bands reveals that PCR is in the linear range (data not shown). Note that the intensities of bands in both heads and bodies are similar at all cycles. VOL. 19,1999 POSTTRANSCRIPTIONAL REGULATION OF erect wing Expression of 116-kDa protein is able to rescue ewg-null phenotypes. We previously demonstrated that the 116-kDa protein is able to rescue the three well-characterized phenotypes associated with ewg mutations: embryonic lethality, erect wings, and formation of dorsal longitudinal muscles (8,10). Expression from two cDNAs expressing ewg minigenes, EWG NS (neuron specific) and EWG HS (basal level expression), rescued viability of ewg l1 , an ethyl methanesulfonate-induced lethal allele, which was thought to be genetic null and 116-kDa protein null (10). The possibility that the ewg locus was formally able to generate several other isoforms made us wonder if ewg l1 was a true null for all possible EWG proteins. Sequencing of genomic DNA from exons B to D revealed a C-to-T base pair change in exon B that resulted in the termination of the ORF at amino acid 187. Since exon B is part of all putative EWG isoforms, the ewg l1 allele is a functional null allele for all possible EWG proteins; moreover, it lacks the DNA binding domain.
The rescue of ewg-associated phenotypes by the transgenes expressing the 116-kDa EWG protein was further confirmed by using a synthetic genomic deletion of the ewg locus as described in Materials and Methods (data not shown). Thus, although it is formally possible that several EWG isoforms are generated, the 116-kDa EWG protein is sufficient to provide the known EWG functions. We cannot rule out, however, the possibility that flies rescued by the 116-kDa EWG protein have subtle abnormalities that were not discerned.
Putative isoforms encoded by the ewg locus. ewg can potentially encode several polypeptides in addition to the 116-kDa EWG protein that includes exon D. Figure 5 depicts the conceptual ewg-generated isoforms. The presence or absence of exon D, which encodes 154 amino acids, does not interrupt the translational frame of the ewg ORF, while inclusion of exon E or I alters the translational frame, leading to a premature stop compared to that of the SC3 ORF (Fig. 5). Additional protein isoforms can also be generated by the ewg transcripts that retain either intron 3a, 3c, or 6. All intron 3a-, 3c-, and 6-containing RNAs result in a premature stop in the SC3 ORF, encoding 408, 574, and 840 amino acids, respectively.
All putative EWG isoforms contain exon B and C. These exons also show the highest homology with the DNA binding motifs of other ewg-like proteins (4,18,31). Omitting exon D from the SC3 ORF, however, significantly increased the homology of EWG to sea urchin P3A2 compared to its homology  Table 3 provides a complete listing of PCR products and summarizes the data. The forward primer was 3aF or 3cF, the return primer was RV, 6R, or 38R. Molecular size markers (GIBCO-BRL) are shown in lanes 5 and 11. In lane 8, a lambda hindIII digest was used as a marker. Note that heads and bodies show differences in the ewg transcript population and abundance due to the body-enriched usage of exons E and I, while increased inclusion of exon D occurs in heads. (B and C) Primer pairs spanning several introns reveal no additional alternative splice events. The italicized letters below each pair of lanes show the amplified section of ewg transcripts. Table 3 Table 2. Note that differences between heads and bodies are mainly detected in the retention of introns 1 and 6 but not 3. Molecular size markers (GIBCO-BRL) are shown in lanes 5 and 13. in the alignment in reference 31, which was done with exon D. Protein sequences encoded by exon E, exon I, and retained introns did not reveal any significant homologies.
Are isoforms other than the 116-kDa protein synthesized? These isoforms should be detectable in immunoblot analysis by the polyclonal antibody generated against the 116-kDa protein, as it recognizes epitopes throughout the protein (Fig. 6A). In fact, the anti-EWG antibody reveals several bands on immunoblots, although the 116-kDa protein is the major band ( Fig.  6B) (10). To determine whether the additional bands are ewg related, we analyzed wild-type (genomic ewg), ewg l1 ; EWG NS4 (expression of SC3 ORF in neurons), and ewg l1 (protein-null) embryonic extracts. Comparison of these extracts should indicate whether the additional bands are protein isoforms of EWG, 116-kDa protein degradation products, or unrelated cross-reacting proteins. As expected, the 116-kDa band was absent from the ewg l1 lane (Fig. 6B). Further, the wild-type and ewg l1 ; EWG NS4 patterns were identical, with the prominent 116-kDa band and several minor bands which are likely products of degradation; the other bands were common to all three extracts. Consistent with the embryonic data, wild-type and ewg l1 ; EWG NS4 (EWG NS4 is not transcribed in nonneural tissues) adult heads or abdomens showed no significant differences in their immunoblot profiles (Fig. 6C and D and data not  shown). Thus, the 116-kDa protein is the major isoform, and other putative isoforms are either minor or not synthesized.
Overexpression of 116-kDa EWG is lethal. The splicing regulation of ewg guarantees that the 116-kDa protein is generated in the head but down regulated in the body. To test if this regulation is biologically important, the effect of overexpression of 116-kDa protein in nonneural tissues was tested. Overall overexpression was achieved by using EWG HS1 and EWG HS7 , two independent insert lines for a transgene in which the SC3 cDNA is driven with the heat shock promoter. The expression from the EWG HS transgene at 25°C mimics endogenous transcription, likely because of the inclusion of regulatory sequences upstream of the ORF (SC3 cDNA in Fig. 1A). Both of these transgenes lead to lethality when homozygous. We determined the viability of flies with different doses of the wild-type ewg allele that also carry EWG HS1 and EWG HS7 (Ta-ble 5). Indeed, flies carrying two doses of heat shock transgenes are less viable, and the males have a lower viability index than females. Males of genotype ewg ϩ /Y; EWG HS7 /Tb or ewg ϩ /Y; EWG HS1 /ϩ have viability indices of 0.76 and 1.00, respectively, but the viability index of ewg ϩ /Y; EWG HS7 /Tb; EWG HS1 /ϩ FIG. 5. ORFs of ewg splice isoforms deduced by RT-PCR analysis. All deduced ewg RNA isoforms are outlined from data presented in Table 2. The size of each ORF is given in amino acids (aa). Transcripts including exon E and I are not shown since the ORF terminates in exon F. Also, ORFs resulting from intron-retaining transcripts are not shown. drops to 0.005. Female survival is better than male survival, but the viability indices are still low: 0.10 and 0.12 for females carrying both transgenes and one or two doses of wild-type ewg, respectively ( Table 5).
The level of expression of 116-kDa protein in different genotypes was assessed in immunoblots ( Fig. 6C and D). Comparison of 116-kDa protein levels between head extracts of wild-type females and ewg l1 females carrying both EWG HS transgenes shows that levels of 116-kDa signals are comparable in these two genotypes but lower in ewg l1 flies carrying only a single dose of EWG HS1 or EWG HS7 (Fig. 6C). Since both neuronal and nonneuronal tissues contribute to the signal in head extracts of flies carrying EWG HS transgenes, the neural expression in flies carrying both transgenes is likely to be much lower than the neural level of 116 kDa in the wild type. Therefore, the EWG HS -driven 116-kDa protein expression in the neural tissue is not likely to be the cause of lethality of EWG HS7 /Tb; EWG HS1 /ϩ genotype.

DISCUSSION
EWG protein provides a vital function in the nervous system, as exclusive neural tissue-specific expression of the 116-kDa EWG protein rescues the lethality caused by the ewg-null alleles. This neural role of ewg is underscored by a robust expression of the 116-kDa protein in neurons at all stages. In fact, the high expression of EWG seen with anti-EWG antibody and the functional information had led us to assume that the weak signal seen outside the nervous system (with the exception of myoblast expression) with the antibody staining in larvae and adults was due to high background (9). Thus, it is surprising and puzzling that despite the broad distribution of the transcript, the two known functions of ewg are associated with specific cell types: neurons and myoblasts. The studies described in this paper show that although the gene is broadly transcribed, the functional transcript encoding the 116-kDa protein is highly enriched in neuron-rich head tissue compared to that in neuron-poor bodies. The neurons in the body must contribute to the functional transcript observed in the body RNA; however, our data do not address if cell types other than neurons are also capable of low-level production of the functional transcript.
Generally, transcriptional controls ensure that specific proteins are synthesized in specific tissues. In the case of ewg, however, posttranscriptional mechanisms are critical to ensure that the 116-kDa protein is present at high levels in neurons. Few examples that show broad tissue distribution of transcript but restricted protein expression are known. In Drosophila, the gene encoding P transposase is transcribed broadly but the productive splice is made only in the germ line cells (1). A second example is the Drosophila Sex-lethal gene (Sxl), which is transcribed in both males and females but generates the functional Sxl protein only in females (22). Similar to the case of ewg, in both of these instances alternative splicing is involved; in the case of Sxl, a large number of intron-containing transcripts are also present (28).
Efficiency of alternative splice events that result in SC3-like transcript is higher in head RNA. Splice events that are crucial for the production of the functional SC3 transcript are inclusion of exons D and J and exclusion of exons I and E. Inclusion of exon D requires splicing of introns 3a and 3c instead of 3b. The 3Ј splice sites of both 3a and 3c and the 5Ј splice site of 3c diverge from the Drosophila consensus (Table 4) (26), making them likely targets for splicing regulation. Intron 3a is inefficiently spliced in both head and body RNAs, whereas 3c is inefficiently spliced in body RNA only. Since exclusion of D resulting from exon skipping is seen in both body and head RNAs, it is likely to be the default mode, with inclusion of D requiring a positive regulatory step. To what extent the inefficient splicing of 3a and 3c affects this regulation is difficult to assess. SC3 transcript also requires the appropriate choice of a 5Ј splice site, resulting in the excision of intron 6. The body RNA shows inefficient splicing in the intron 6 region and, among the spliced products, about equal amounts of exon I inclusion and intron 6 excision. It is likely that RNAs that retain intron 6 become polyadenylated as consensus sequences for polyadenylation exist in intron 6; in case these are used, transcripts that encode different C termini will be generated.
Splicing of introns 1 (ϳ4.5 kb), 3a (299 bp), 3c (807 bp), and 6 (1,722 bp) is inefficient; all these are large introns. Some examples of intron retention in Drosophila include transcripts of Dopa decarboxylase (2), Suppresser-of-white-apricot (34), and Sxl (28). Whether intron-containing ewg transcripts exit the nucleus is not known, although the presence of 5Ј and 3Ј splice sites in a transcript is often sufficient to retain most introncontaining RNAs in the nucleus (23). However, the presence of intron-containing messages in the cytoplasm has been reported; examples include insulin pre-mRNA (33) and bovine growth hormone pre-mRNA (11). Intron retention can be a powerful means of gene regulation; the Drosophila Transformer-2 protein self-regulates the retention of intron M1 in germ line tissue (25). This intron retention in tra-2 mRNA has important phenotypic consequences, and it is thought to help maintain appropriate levels of Tra-2 protein in germ line tissues (25). Intron retention can also be a means to generate alternate protein isoforms, examples of which include C-CAM3 and rVDR1 (7,12).
Alternative splicing is a widespread mechanism to regulate functional properties of DNA binding proteins to modulate DNA binding activity and dimerization properties (reviewed in reference 24). Since all conceptual alternative EWG isoforms contain the DNA binding domain encoded by exon B, these proteins are likely to differ in either dimerization or activation properties. The activation domain of NRF-1, a protein that has homology to EWG, has several glutamine-containing hydrophobic clusters (20). There are five such clusters in headenriched exon D and two in exon H. Thus, the head-enriched 116-kDa protein may have stronger activation properties than proteins that lack exon D. The dimerization region for EWG has not been characterized. Are any of the putative isoforms synthesized, do they have a function, or will they provide ewg function? The putative alternate protein isoforms have no essential role in the nervous system, as the 116-kDa EWG protein rescues viability of null ewg alleles, although they may still provide subtle functions in vivo or substitute for the 116-kDa function. Whether the alternative proteins are synthesized at low levels is not known.
The studies with heat shock transgenes show that overexpression of 116-kDa protein in nonneural tissues can be lethal, yet it is the expression of the 116-kDa protein in neurons that provides the viability function associated with ewg. With two doses of the EWG HS transgenes, both ewg l1 and ewg l1 /ϩ flies were about equally lethal, suggesting a formally neomorphic effect. The lethality could result from the expression of 116-kDa protein in nonneural tissue or be due to general overexpression. An alternative possibility of the transgene generating a toxic novel protein cannot be discounted but is unlikely because the transgene was constructed by using a cDNA and not genomic DNA. Since the 116-kDa protein levels in the head with two doses of EWG HS are comparable to the wildtype levels, the protein is unlikely to attain a higher than normal level in the neural tissue. Moreover, higher than normal levels of 116-kDa protein generated through two doses of EWG NS transgenes, which are expressed only in the nervous system, is not lethal (our unpublished observations). Therefore, expression outside the nervous system is likely to be the primary cause of lethality. Thus, given that ewg is broadly transcribed, its posttranscriptional regulation is crucial to the fulfillment of its function.