«Operon and non-operon gene clusters in the C. elegans genome* Thomas Blumenthal1§, Paul Davis2 and Alfonso Garrido-Lecca1 Department of Molecular, ...»
Operon and non-operon gene clusters
in the C. elegans genome*
Thomas Blumenthal1§, Paul Davis2 and Alfonso Garrido-Lecca1
Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder,
CO 80309 USA
Wormbase, European Bioinformatics Institute, EMBL, Wellcome Trust Genome Campus,
Hinxton, CB10 1SD, UK
Table of Contents
1. Trans-splicing and operons
2. SL2-type operons
3. Hybrid operons
4. SL2-type operons with long spacing
5. SL2-type operons with juxtaposed 3’ end formation and trans-splice sites
6. SL2 trans-splicing at downstream exons
7. Alternative operons
8. SL1-type operons
9. Dicistronic mRNAs
10. Overlapping genes
11. Identification of operons
12. Regulation in and of operons
13. Genome architecture
14. Tables 1-8
Abstract Nearly 15% of the ~20,000 C. elegans genes are contained in operons, multigene clusters controlled by a single promoter. The vast majority of these are of a type where the genes in the cluster are ~100 bp apart and the pre-mRNA is processed by 3’ end formation accompanied by trans-splicing. A spliced leader, SL2, is * Edited by Julie Ahringer Last revised September 2, 2014. Published April 28, 2015. This chapter should be cited as: Blumenthal T., Davis P., Garrido-Lecca A. Operon and non-operon gene clusters in the C. elegans genome (April 28, 2015), WormBook, ed. The C. elegans Research Community, WormBook, doi/10.1895/wormbook.1.175.1, http://www.wormbook.org.
Copyright: © 2015 Thomas Blumenthal, Paul Davis, Alfonso Garrido-Lecca. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
To whom correspondence should be addressed. E-mail: firstname.lastname@example.org §
specialized for operon processing. Here we summarize current knowledge on several variations on this theme including: (1) hybrid operons, which have additional promoters between genes; (2) operons with exceptionally long ( 1 kb) intercistronic regions; (3) operons with a second 3’ end formation site close to the trans-splice site; (4) alternative operons, in which the exons are sometimes spliced as a single gene and sometimes as two genes; (5) SL1-type operons, which use SL1 instead of SL2 to trans-splice and in which there is no intercistronic space; (6) operons that make dicistronic mRNAs; and (7) non-operon gene clusters, in which either two genes use a single exon as the 3’ end of one and the 5’ end of the next, or the 3’ UTR of one gene serves as the outron of the next. Each of these variations is relatively infrequent, but together they show a remarkable variety of tight-linkage gene arrangements in the C. elegans genome.
1. Trans-splicing and operons Operons are polycistronic clusters of genes transcribed from a promoter at the 5’ end of the cluster. Although operons were considered to be absent from eukaryotic genomes, it is now clear that operons are present in the genomes of numerous eukaryotes (Spieth et al., 1993; Blumenthal et al., 2002; Guiliano and Blaxter, 2006). For example, the Drosophila genome contains 30 dicistronic clusters that make mRNAs that encode two different genes (Misra et al., 2002). Most operons in eukaryotes are of a different type in which the polycistronic pre-mRNA arising from the operon is co-transcriptionally processed by 3’ end formation and spliced leader (SL) trans-splicing between the genes to make monocistronic mature mRNAs (Blumenthal, 2004). In SL trans-splicing, a short spliced leader RNA exon is spliced onto the 5’ end of the pre-mRNA by conventional splicing mechanisms, providing a cap for the downstream mRNAs.
Nearly 15% of the ~20,000 protein-coding genes in the C. elegans genome are organized into ~1250 operons, tight clusters of two to eight genes (Allen et al., 2011). In most cases the genes are separated by an intercistronic region of ~100 bp from the polyA addition site of the upstream gene to the trans-splice site of the downstream gene.
The polycistronic pre-mRNA is processed by coordinated 3’ end formation of the upstream gene and trans-splicing at the 5’ end of the downstream gene. This trans-splicing event involves a spliced leader, SL2, specialized for operon pre-mRNA processing.
However, SL1 trans-splicing is the more common kind of trans-splicing in C. elegans. Most SL1 trans-splicing occurs near the 5’ ends of genes rather than downstream in operons. This removes the outron, the RNA between the transcription start site and the first 3’ splice site in the pre-mRNA. A rare type of operon, SL1-type, uses SL1 for trans-splicing a downstream gene. SL1-type operons are mechanistically quite interesting since they have no intercistronic sequence; polyadenylation of the upstream gene occurs right at the trans-splice site of the downstream gene (Williams et al., 1999). In these operons, 3’ end formation may occur by SL1 trans-splicing of the downstream gene, resulting in a free 3’ end upstream that may then be debranched and polyadenylated. Thus, in these cases, the same processing event at least sometimes may serve to create the 3’ end of the upstream gene mRNA and the 5’ end of the downstream gene mRNA.
Here we present an in-depth analysis of gene clusters in the C. elegans genome in an effort to determine whether they all represent true operons and whether there are alternative ways of processing operon pre-mRNAs.
Several variations of SL2-type operon are considered. These include the relatively common hybrid operons, which have longer intercistronic regions that accommodate an extra promoter between the genes (Huang et al., 2007;
Whittle et al, 2008), as well as several less common variations on the operon theme. For example, some operons have an extra polyadenylation signal (AAUAAA) near the trans-splice site that results in 3’ end formation close to the site of trans-splicing. There are some operons with unusually long ( 1 KB) intercistronic regions (Morton and Blumenthal, 2011). Alternative operons are sometimes spliced as if they are a single gene (Jan et al., 2011; Morton and Blumenthal, 2011). The existence of a variation on the SL1-type operon, in which the upstream mRNA is discarded rather than being polyadenylated, is described. In this case, the expression of the upstream gene occurs from a dicistronic mRNA. The worm genome also contains occasional examples of operons that make dicistronic mRNAs, which are apparently translated in that form, much like the Drosophila operons. However, it is currently unknown how translation of the downstream cistron is initiated. Finally, two types of tightly linked non-operon gene clusters are identified. In one type, a single exon serves as the 3’ end of an upstream gene as well as the 5’ end of a downstream gene. In another type, the 3’ UTR of an upstream gene serves as an outron for SL1 trans-splicing of a downstream gene. These are not operons since the genes are not transcribed from the same promoter. In this paper, examples of each kind of gene cluster are described, and tables with lists of known examples of each type are included.
Operon and non-operon gene clusters in the C. elegans genome
2. SL2-type operons Allen and coworkers mapped SL1 and SL2 trans-splice sites from deep sequencing transcriptome data (Allen et al., 2011), and used this data to identify operons supported by SL2 trans-splicing (Supplemental File 3 in Allen et al., 2011). Almost all gene clusters in the C. elegans genome are of the type exemplified in Figure 1, in this case a four-gene operon. SL2-type operons range from two to eight genes expressed from a single promoter at the 5’ end of the cluster. We have performed chromatin immunoprecipitation experiments with an Affymetrix tiling array (ChIP/chip) for H3K9 acetylation (H3K9ac), an epigenetic mark of active promoters. The data show that for most operons there is only a single H3K9Ac peak, found at the 5’ end of the cluster. An example is shown in Figure 1.
The genes are very close together, usually ~100 bp separating the 3’ end of the upstream gene (the site of polyA addition) and the trans-splice site of the downstream gene and virtually all trans-splicing is to SL2. Furthermore, Allen et al. demonstrated a strong relationship between the intercistronic distance and the percent of trans-splicing to SL2: shorter distance correlates with increased SL2. There are around 1250 operons of this kind, a subset of which are hybrid operons considered in the next section.
Figure 1. A typical SL2-type operon.
The figure shows a four-gene operon with exons shown as colored boxes and introns as angled lines. The direction of transcription is from left to right, as indicated by the arrows on the 3’ UTRs. The green bar denotes the extent of the operon. The pre-mRNA is processed to yield multiple mature mRNAs. The first gene in the operon rnf-121 is trans-spliced mostly to SL1, whereas the three downstream genes are trans-spliced mostly to SL2 (data from Allen et al. 2011). In each case, the 3’ cleavage sites are ~100 bp upstream of the SL2-specific trans-splice sites.
This is a download from Wormbase. There are 1200 documented operons of this type in the C. elegans genome (Allen et al., 2011). Here and throughout, the number of SL1 and SL2 reads shown is data from the deep sequencing project reported in Allen et al. (2011). Sequencing was performed on RNA isolated from several different worm stages and the data pooled. Below the operon bar, H3K9ac ChIP/chip performed on young adult worms, which marks active promoters, is shown. The length of the vertical lines is proportional to the intensity of the hybridization signal corresponding to the indicated genomic position. The horizontal black bar marks a peak called by the data analysis program.
3. Hybrid operons Many operons have the characteristics described above, but have another important feature: in addition to the upstream operon promoter, they contain a second promoter within the operon. This has been demonstrated in several ways. Transgenic constructs containing only the intercistronic DNA fused to a reporter gene were found to be expressed in patterns characteristic of the endogenous genes (Huang et al., 2007). When analyzed by ChIP for proteins previously shown to peak at promoters—including the variant histone HTZ, the histone modification H3K4Me3, and RNA polymerase II—many operons were found to contain peaks between genes (Whittle et al, 2008; Baugh et al., 2009). Furthermore, some operons have a peak of ser-5 phosphorylated RNA polymerase, which typically occurs near 5’ ends of genes, between genes in operons (A. Garrido-Lecca and T. Blumenthal, unpublished observations). Finally, SL1 trans-splicing, generally associated with sites close to promoters, occurs preferentially at some downstream operon genes with long intercistronic regions (Allen et al., 2011). Further validation of the relationship between an internal promoter and SL1 trans-splicing was provided by analysis of two deletion strains: a deletion within a hybrid operon intercistronic region dramatically reduced SL1 usage at the downstream trans-splice site, while increasing the SL2 trans-splicing at this site. In contrast, a deletion at the 5’ end of a different hybrid operon dramatically reduced the SL2 trans-splicing at a downstream gene, leaving the SL1 trans-splicing unchanged (Allen et al., 2011).
Operon and non-operon gene clusters in the C. elegans genome
The intercistronic region from a typical hybrid operon is depicted in Figure 2. The downstream gene in this operon receives significant levels of SL1, presumably from the internal promoter, and of SL2, presumably from transcripts originating from the promoter at the 5’ end of the operon. Note also the 500 bp intercistronic region.
The ChIP/chip peak of H3K9ac between the genes indicates the location of an internal promoter (Figure 2).
Figure 2. The intercistronic region from a typical hybrid operon.
In this kind of operon, there is a promoter at the 5’ end of the cluster (not shown) and an additional promoter between the genes as shown by H3K9ac ChIP/chip data (bottom). In the example shown here, the intercistronic distance is longer than in a typical operon (~500 bp). Transcripts from the promoter at the 5’ end of the cluster undergo 3’ end formation of C23H3.5.2, along with SL2 trans-splicing at the 5’ end of sptl-1. The internal promoter is responsible for synthesis of an outron that is trans-spliced by SL1 at the same site. In this example there are 192 SL2 and 481 SL1 reads at the site indicated by the vertical arrow for sptl-1 (Allen et al., 2011). Below the operon bar H3K9ac ChIP/chip, which marks active promoters, is shown. The length of the vertical lines is proportional to the number of reads corresponding to the indicated genomic position. The horizontal black bar marks a peak called by the data analysis program.
Hybrid operons usually have intercistronic distances 500 bp and from 10-80% SL2 usage at the downstream trans-splice site. While not as common as standard SL2-type operons, the hybrid operon arrangement is not uncommon in the C. elegans genome. To what extent can intercistronic distance alone predict operon status?
Intercistronic distance for gene pairs with the site of poly A addition of another gene less than 1000 bp upstream were calculated. They were divided into three groups based on their percent SL2 trans-splicing: 80%, 10-80%, and 10% (Figure 3). The vast majority of genes receiving 80% SL2 are 90-120 bp downstream of another gene, whereas the rest are generally much farther from genes in the same orientation. In fact, the difference is so dramatic that it is mostly accurate to conclude that genes in this close range can be reliably diagnosed as downstream in operons solely based on intercistronic length. Furthermore, the distribution of genes with very low SL2 levels (0-10%) is quite different from that of those receiving between 10 and 80% SL2. The former are presumably not in operons, while the latter likely represent hybrid operons.
Operon and non-operon gene clusters in the C. elegans genome