Main Page

NEW FEATURES

 

Intronic AREs added 9-Sep-2013:

Human genes with an identifiable ARE in their introns are now also captured in ARED.

 

Group 4 pattern updated 13-Oct-2012:

We relaxed the pattern used to capture group 2 AREs from the earlier WW[ATTTATTTA]WW form to WWA[TTTATTT]AWW (with one allowed mismatch outside of the bracketted region). The subtle enhancement captures a larger number of AREs that were missed before.

 

Link to ARESite added 15-Mar-2011:

ARESite is an online resource that complements AREDOrg by providing researchers with additional information about a given ARE site. We added a link to ARESite on the results page for easy access to this resource.

 

Group 5 pattern updated 5-Mar-2008:

We relaxed the pattern used to capture group 1 AREs from the earlier WWWT[ATTTA]TTTW form to WWWT[ATTTA]TWWW (with one allowed mismatch outside of the bracketted region). The relaxed form captures a significant number of AREs that were missed before.

 

Expansion to non-human organisms:

The current release of ARED adds a database of AREs in the widely analyzed genomes of mouse (Mus musculus) and rat (Rattus norvegicus). We chose these two organisms due to their great importance as lab experimental systems and also because of the relatively rich repertoire of transcript sequences available for these two organisms. Future expansions of the collection to include other organisms will only require the availability of a reasonably well catalogued transcriptome rich in 3’-complete transcript sequences.

 

Usage of ENSEMBLE as a data basis and the streamlining of the analysis pipeline:

Earlier releases of ARED mainly used UNIGENE gene clusters to obtain consensus full-length gene representatives on which the analysis was based. That set was further augmented with RefSeq sequences and mRNA records from GenBank. Complex filtration steps were applied to obtain unique gene clusters, correctly identify the 3’-end of the sequences, extract the 3’-UTR coordinates from annotation and assess its completeness. In addition, UNIGENE clusters are in a constant state of change and do not provide a permanent reference to the sequences, making future back-referencing difficult. Such issues are naturally to be expected due to the open, automated, and generally uncontrolled nature of the UNIGENE and GenBank collections.

 

In this release we decided to address most of the above shortcomings by using the ENSEMBL gene collection. This dataset is easily downloadable via BioMart, provides a reasonable view of the variation in a gene’s transcripts, and provides a complete and reliable annotation pipeline allowing us to use their end-product and focus directly on our target of interest. We used ENSEMBLs “External database” annotation to link back to RefSeq, UNIGEN and ENTREZ and allow search by these identifiers too. The annotation also allowed us to link in a straightforward manner to the Gene Ontology (GO) annotations.

 

A Gene oriented, transcript based analysis:

Recent biological knowledge has dispelled the older “one gene, one transcript, one protein” dogma. We now understand that variation plays an important role in biological systems. Accordingly, gene databases now use the term gene to refer to a cluster of transcriptional variants (linked to common genomic locus). The term transcript is commonly used to refer to a specific variant corresponding to the physical entity. This distinction is necessary since in ARED we classify genes but we analyze transcripts. A gene is assigned the most stringent ARE class among its transcripts and assigned as “Non ARE-containing” only if none of its transcripts contains an ARE.

 

Clearer negative results and transcript based evidence:

A gene is ARE-containing if it is protein coding and has the minimal ARE motif in its 3’-UTR. Consequently, a negative result to a user’s gene query can be due to multiple reasons; some related to the query, some to ARED and some to the original data. In earlier versions, the user had no way to distinguish these reasons and the negative response was generally unsatisfying. In the current release, this issue has been addressed by including all genes in the current release of ENSEMBL in ARED and checking all transcripts for the presence of a 3’-UTR (by being protein coding and the presence of the 3’UTR in the annotation), the completeness of the 3’-UTR (by looking for the poly-adenylation signal – AWTAAA – in the last 50 bases of the 3’-UTR), and finally searching for the ARE motifs in the 3’-UTR.

 

Thus based on the above, if a gene requested by the user has at least one protein coding transcript with an annotated 3’-UTR and an identifiable ARE motif (even if the 3’-UTR is not complete in terms of containing a polyA signal motif) the relevant transcript is reported as evidence for the positive classification.

 

If on the other hand, the gene requested by the user was analyzed by ARED, and all of its protein coding transcripts contained 3’-UTR that were also complete but did not contain the ARE motif, then one of those transcripts is presented as strong evidence for the negative classification.

 

Lastly, if the gene requested by the user was analyzed, yet it had no protein coding transcripts, or none of its protein coding transcripts had a 3’-UTR, or none of its protein coding, 3’-UTR-containg transcripts was either complete or contained an ARE motif, then that gene is given a weak negative classification and is reported as non-ARE containing but without a supporting transcript as evidence. This is simply an acknowledgment to the fact that the transcript collection in ENSEMBL (or actually any other database for that matter) is incomplete and thus a yet unknown transcript of the gene may contain the ARE motif.

 

Of course, if the user’s query did not match any gene in our database (either because it does not exist or was not in the release of ENSEMBL we downloaded) the user receives a “0 matched records” response.

 

Other enhancements:

The entire user interface and collection of small scripts used to process and prepare the data have now been rewritten as a cohesive application. Though the overall user experience was largely maintained from the earlier releases, the underlying processing and pipeline were largely revised and reworked. We hope this will both enhance the quality of our data and ease future updates and revisions.

 

 

 

 

TABLES

 

Table 1: Group distribution in the three organisms

 

ARE Group

Human

Mouse

Rat

1

39 (0.12%)

27 (0.10%)

4 (0.01%)

2

53 (0.17%)

48 (0.17%)

12 (0.04%)

3

351 (1.11%)

308 (1.10%)

103 (0.38%)

4

316 (1.00%)

214 (0.76%)

75 (0.27%)

5

1,354 (4.30%)

884 (3.16%)

332 (1.22%)

No ARE

29,411 (93.30%)

26,494 (94.71%)

26,776 (98.07%)

Total

31,524

27,975

26,302

 

 

Table 2: Patterns used to identify ARE clusters

 

Group

Pattern

1

ATTTATTTATTTATTTATTTA

2

ATTTATTTATTTATTTA

3

ATTTATTTATTTA

4

WWA[TTTATTT]AWW

5

WWWT[ATTTA]TWWW