Preloaded Reference Alignment
For aligning reads to a preloaded reference such as the human, mouse, or rat genome, NextGENe uses a Preloaded Index Alignment algorithm. This algorithm employs a suffix array that is represented by the Burrows-Wheeler Transform (BWT). A rank algorithm allows the software to traverse the suffix array to find the best matching location for each read. In addition to the BWT, the software maintains genome positions at every four base pairs within the genome, which allows the software to monitor these locations while traversing the reference genome.
NextGENe first attempts to match the entire read exactly to the reference. Reads can be matched to a single position, or they can be matched to multiple positions. To align reads that match exactly at more than one position, set the Allowable Ambiguous Alignments setting to a value that is greater than one, with 50 being the recommended value. (See Allowable Ambiguous Alignments.) If this option is set to a value of one, the read is aligned to the first exact match position from the beginning of the reference. If this option is set to a value of zero, then all reads that match perfectly at more than one location are discarded.
For reads that cannot be matched exactly, NextGENe tries to match the entire read with an increasing number of mismatches, starting at one mismatch and continuing up to the maximum number of allowable mismatches, as set by you. (See Allowable Mismatched Bases [ ].) For reads that can still not be matched, seeds that are smaller than the read lengths are used to identify the best matching position within the genome. After finding the best match, a dedicated NextGENe algorithm expands the alignment to align the entire read which, in turn, allows the individual reads to be aligned with indels and mismatches.