Non-coding RNA Overview
The majority of non-coding RNA, or ncRNA, in Ensembl has been generated using RFAM. A small handpicked set is also available. This ncRNA set, as well as a detailed description of the annotation methods can be obtained from ftp.genetics.wustl.edu.
The following non-coding RNA gene types are annotated, along with pseudogenes
- tRNA
- nuclear transfer RNA
- Mt-tRNA
- mitochondrially-derived tRNA located in the nuclear genome
- rRNA
- ribosomal RNA
- scRNA
- small cytoplasmic RNA
- snRNA
- small nuclear RNA
- snoRNA
- small nucleolar RNA
- miRNA
- microRNA precursors
- misc_RNA
- miscellaneous other RNA
Annotation Details
Most ncRNA is annotated by aligning genomic sequence against RFAM using BLASTN. The BLAST hits are clustered and filtered by E value and are used to seed Infernal searches of the locus with the corresponding RFAM covariance models. The purpose of this is to reduce the search space required, as to scan the entire genome with all the RFAM covariance models would be extremely CPU-intensive. The resulting BLAST hits are then used as supporting evidence for ncRNA genes.
miRNA is predicted by BLASTN of genomic sequence slices against miRBase sequences. The BLAST hits are clustered and filtered by E value and the aligned genomic sequence is then checked for possible secondary structure using RNAFold. If evidence is found that the genomic sequence could form a stable hairpin structure, the locus is used to create a miRNA gene model. The resulting BLAST hit is used as supporting evidence for the miRNA gene.

