Transcriptome with Alternative splicing alignment algorithm
The first step is a basic alignment of the whole genome. An attempt is first made to align entire reads to the reference sequence without any mismatches. Short seed sequences within the reads are then used to align the reads to the reference sequence.
The second step is alignment to exon junctions using a reference sequence of exon-exon junctions that was created using annotated genes. Any reads that could not be aligned to the genomic reference sequence are aligned to this reference sequence of exon-exon junctions. The positions are translated back to genomic reference positions. Reads are more completely aligned, especially those reads in regions that are near the end of exons.
The third step is detecting and linking exons. Potential exon regions are recorded. A link is recorded if two exons are at least partially covered by the same read. Several filtering steps are carried out to remove false positives.
The fourth step is an alignment to the detected transcripts. A reference sequence of mRNA transcripts (a reference without intron sequences) is generated based on the link information. The original reads are aligned to this reference and the coordinates are translated back to genomic positions.
After alignment is completed, regions (covered or annotated) and links are called and then compared to known transcripts so that the regions and links be classified.