Setting | Description |
---|---|
Paired End Data | Select this option if you are assembling paired end data. |
• Library Size | • The size of the fragment that is being sequenced. |
• Long Library Size (> 1000 Bases) | • If the library is greater than 1000 bases, then in addition to specifying the library size, you must also select this option. |
Section Size | Available only if Long Library is selected. Scaffold contigs are broken into sections when they are being assembled so that the distance between the contigs can be estimated. For the majority of datasets, the default value of 400 is the recommended value. |
Minimum Scaffold Length | Available only if Long Library is selected. Any scaffold contigs that are shorter than the specified Minimum Scaffold Length are discarded and are not are used in the generation of the final contigs. |
Word Length | The word length that is used for scaffolding. This value is determined by the average depth of coverage for the data. The lower the average depth of coverage for the data, the shorter this value should be. Conversely, the higher the average depth of coverage for the data, the longer this value should be. (Longer word lengths result in greater noise reduction.) If coverage falls within the range of 20-30x, the recommended word length is 23. If coverage is approximately 50x, the recommended word length is 29. The maximum recommended value for word length is 31. |
High Coverage Limited: Max Coverage = [x] | The maximum coverage that is to be used for assembly. For sequences with higher coverage, reads up to the maximum coverage are used. Additional reads with the sequence are ignored, which increases processing speed. |
Final Contig Merging | Merges any overlapping contigs that were found after scaffolding and linking with the paired reads are complete. |
Reduce Memory Usage | When this option is selected, only the 5’ end of the read is used to create “words” for indexing (to determine overlaps). The number of bases used to index is determined according to the following: (0.5+ (20/L))(L), where L = the average read length. Note: The memory that is conserved by this method is more significant for longer reads. For 36 bp reads, there is no difference in the memory that is used. |