Skeleton assembly method for Roche/454 data

NextGENe Online Help : Sequence Assembly Tool : Sequence Assembly Tool Settings : Skeleton assembly method for Roche/454 data

The Skeleton assembly method uses seed keys, which are sequences between homopolymers (three or more identical nucleotides), to look for overlap between reads. Although the average distance between homopolymers is 16 bp, much longer stretches without homopolymers can occur. (A read with a length of 256bp contains an average of 16 keywords.) When this is the case, seed keys are created between “AAT” or “TAA” sequences. By comparing reads with homopolymer sequences or AAT or TAA sequences instead of comparing at every base position, processing time is significantly decreased. The Skeleton assembly method is recommended for Roche/454 reads or any other long reads datasets with an average read length that is greater than or equal to 70 bp.

Setting	Description
Seed Key Length >= [x] Bases, <= [y] Bases	Specifies the length range for seed key sequences. If the number of bases between homopolymers is greater than “y,” then seed keys are created between “AAT” or “TAA” sequences.
Seed Key Coverage >= [x], <= [y]	The number of reads that match a seed key must fall within this range to be used in the assembly.
Auto Estimate	Select this option to have the software estimate the seed key coverage values. Note: If this option is selected, then the two options above are unavailable. Instead, NextGENe automatically calculates these values.
Assembled Contig Length to Output >= [x] Bases	Specifies the minimum contig length that is to be included in the Assembled Sequences output file. Any contigs that contain fewer than this number of bases are saved in a shortContigs.fasta file.