Main Page

ARED Integrated:

 

The three AU-rich databases (ARED) were merged into one database, following the structure of the latest ARED release (ARED3). The new product was named as ARED-Integrated.

 

Every effort was made to assign specific AU-rich groups to the pre-existing entries that were not previously explicitly assigned (bearing in mind that these group assignments reflect the AU-rich motifs used at the time each database was built).† The program FindPatterns+ was used to assign missing groups for entries using the AU-rich patterns described earlier. For cases when no match was found, the program was used again with the relaxed AU-rich patterns (i.e. groups 4b and 5b) to fill in some of the gaps. In cases of inconsistency with group assignment between ARED 2 and ARED 3, the group assignments from ARED 3 overruled.

 

ARED-Integrated was also cross-referenced with Unigene (release 190), Gene Ontology, GO (through Entrez, release dated 20/04/2006), as well as the GenBank definition lines (release 153).

 

A new web application was developed that included all of entries in ARED Integrated plus ARED 4.0.

 

NCBI RefSeq for Homo sapiens (release number 16) was used to update the AU-rich database (ARED). The 3'-UTR ends of all RefSeq sequences were retrieved. In cases where there was no CDS feature in a RefSeq entry, the whole sequence was used instead. The sequences were cleaned of trailing polyA tails (10 or more bases long, within the last 50 bases, from the 3í-end), and searched for AU-rich patterns. The program FindPatterns+ (part of GCG Version 11.0, Accelrys Inc., San Diego, CA) was used to search for the different groups of the AU-rich pentamer repeats(AUUUA). For AU-rich cluster groups 1, 2, and 3 (5 or more, 4, and 3 overlapping pentamer repeats, respectively), a single mismatch was allowed anywhere in the aligned sequence. For groups 4 and 5 (2 and 1 pentamer repeats, respectively), a single mismatch was allowed in the flanking regions only. The matching list of RefSeq sequences was then filtered based on the presence or absence of the polyA signal (AWTAAA), within the last 50 bases from the 3í-end.

 

About 1000 of the matched AU-rich sequences were new to ARED and thus added to the database. These are labeled as version 4.0 in the ARED-Integrated web application. In addition, ARED 4.0 was cross-referenced with Unigene (release 190), Gene Ontology, GO (through Entrez, release dated 20/04/2006), as well as the GenBank definition lines (release 153).

 

The three AU-rich databases (ARED) ARED 1.0, 2.0, and 3.0 in addition to ARED 4.0 were merged into one database, following the structure of the latest ARED release (ARED3). The new product was named as ARED-Integrated. ARED-Integrated was also cross-referenced with Unigene (release 190), Gene Ontology, GO (through Entrez, release dated 20/04/2006), as well as the GenBank definition lines (release 153).

AU-rich sequences in ARED have always been grouped by their respective Unigene clusters. As itís well known, the Unigene clustering changes between different Unigene releases, and as a result, the Unigene assignment of a particular AU-rich sequence in ARED differs between different ARED releases. However, the new web application links previous ARED entries to their respective ARED databases (i.e. versions 2 and 3) for comparison. Unless otherwise specified, all software used for these projects was developed in-house (java release 1.5).