Floton/Floton-PE assembly method for Roche/454 and Ion Torrent data

NextGENe Online Help : Sequence Assembly Tool : Sequence Assembly Tool Settings : Floton/Floton-PE assembly method for Roche/454 and Ion Torrent data

The Floton assembly method developed by SoftGenetics reduces the number of homopolymer errors, which is a common problem in flow-based sequencing technology. The Floton assembly method converts the sequence into its original flows, which consist of the nucleotide and the number of consecutive calls for the nucleotide.

The Floton-PE method is identical to the Floton assembly method, but it is used solely for paired end data.

By converting the sequence data into this format, the homopolymer indels that were difficult to assemble become basically SNPs (in the base count), which allows for the correction of most homopolymer errors.

In the Floton assembly method, reads are indexed with several flowmers. This information is used during the first two steps of the three step assembly process:

1. Condensation – Reads that share flowmer indexes are compared and used to generate high-quality consensus contigs. The same read can be used in multiple condensation contigs.

2. Combination – An iterative process checks for condensation contigs that contain the same reads for the purpose of discovering and merging overlaps.

3. Overlap Merging – The combination contigs are combined into the final assembly contigs.

Setting	Description
Settings
	Select the assembly type that applies to your data: • Small Genome (< 10MB) • Large Genome • Sequence Repeats • PCR/Haplo/HLA Typing • Metagenomics • Others
Coverage Normalized to [30] X	Normalizes coverage for the assembly. This decreases processing time by ignoring reads where coverage is above the set threshold. The default value is 30.
Pair Normalized to [20]X	Available only for the Floton-PE assembly method. Automatically implemented if Coverage Normalized is selected. The coverage of paired reads is normalized to the value that you specify.
If you select Coverage Normalized, then you must select one of the following methods, which determine which reads are kept and which reads are discarded.
• Method 1 (Selected) • Method 2 (Random)	• This method checks keywords (sequences between homopolymers) in the reads and preferentially keeps reads where one or more of the keywords has low coverage. Note: Method 1 increases processing time. • This method randomly selects which reads are kept and which reads are discarded.
Note: The following output files are specific to the Floton/Floton-PE assembly method. To view a list of output files that are produced for any assembly method, see Sequence Assembly Output Files.
Output Condensation	Creates the *_CondensedSequences.fasta file, which is the output from the Condensation step. This file lists the extended sequence for each original read with the original data title and in the original data order.
Output Combination	Creates the *CombinedSequences_.fasta file, which contains the results for the Combination step.
Length Cut off <= [ ] x Avg Read Len or [300] bp	Rejects a contig that has length (number of base pairs) that is less than or equal to the indicated threshold. You can specify the threshold in one of two ways: • A multiple of the average read length. • A specific number of base pairs. The default value is 300 bps.
Advanced
Automatic	Select this option to have NextGENe automatically determine the appropriate values for the Index Length, Index Count, and Remove Low Frequency options based on the loaded data. If you do not select Automatic, then you can manually select the values for these options.
• Index: Length [16] Flows	• Select a value to create an index of the indicated length that ends in a homopolymer sequence. The default value is 16 bp.
• Index Count [4] Per Read	• Select a value to create the indicated number of primary indices per read. The default value is four primary indices per read. The index number can be either one, or an even value (2, 4, and so on.) NextGENe prioritizes the indices based on such factors as the homopolymer length. For example, if the index number is set to four, then the two indices that have the highest priority in the first half of the read and the two indices that have the highest priority in the second half of the read are selected as the indices. If the index number is set to one, then the index with the highest priority is selected as the index, regardless of which half of the read that it falls in.
Note: For reads with a higher average coverage per read, a smaller number of indices is recommended. Conversely, for reads with a longer average read length, a larger number of indices are recommended.
Remove Low Frequency [ ] or [ ]%	Rejects the entire contig if the coverage is less than or equal to the indicated threshold or trims the end of the contig if the coverage of the ending bases is less than or equal to the set percentage of the maximum coverage for the contig.
Error Tolerate [ ]% and Ignore [ ] bp	Combine two contigs only if the percent difference between the two contigs is less than or equal to the indicated threshold, and when combining, ignore the differences in the indicated number of base pairs at the end of each contig.