Sequence Condensation Tool - General Settings

NextGENe Online Help : Sequence Condensation Tool : Sequence Condensation Tool - General Settings

Setting	Description
Inspect Input Files	Available only if you are analyzing Illumina data, SOLiD System data, or Ion Torrent data. Click this button to have the Condensation Tool scan your data files and determine optimum settings on this page as well on the Advanced Settings page.
Read Counts	The range that best describes the number of reads that are included in your sample dataset. After you click Inspect Input Files, the value for Illumina datasets, SOLiD System datasets, or Ion Torrent datasets is automatically set, but you can modify the value if needed. Note: If multiple data files are being analyzed, this value is the total for all files.
Read Lengths	The number that best represents the length of reads for your sample dataset. After you click Inspect Input Files, the value for Illumina datasets, SOLiD System datasets, or Ion Torrent datasets is automatically set, but you can modify the value if needed.
Reference Length	The number that best represents the length of reads for the reference sequence. When a reference file is loaded, after you click Inspect Input Files, the value for Illumina datasets, SOLiD System datasets, or Ion Torrent datasets is automatically set but you can modify the value if needed. For preloaded reference files, you must manually enter the value. Note: For de novo Assembly, which does not include a reference file, you can manually specify this value, which is used to estimate the expected coverage.
Expected Depth of Coverage	The range that best represents the expected depth of coverage for your sample dataset. After you click Inspect Input Files, the value for Illumina datasets, SOLiD System datasets, or Ion Torrent datasets is automatically set to the total number of bases in sample files divided by the number of bases in reference file. For identifying low frequency variations, the Expected Depth of Coverage should be set to that of the minor allele. You can modify the value if: • There are many reference positions that will have no coverage. • There are many bases of sample file that will not match to the selected reference. • The minor allele might be found at a depth of coverage lower than what was calculated.
Condensation Type	For Illumina data, SOLiD System data, or Ion Torrent data, select one of the following: • Consolidation (to reduce read number) • Elongation (to maintain read count) • Error Correction (to reduce errors without reducing read count or lengthening reads) For Roche/454 data, the only available option is Error Correction.
Paired	Available only if you select Elongation for Illumina data. Click this option to open the Merge Overlapping Paired Reads dialog box. On this dialog box, you can indicate that you want to merge overlapping paired reads after elongation. You can also indicate if you want to ignore low quality ends for non-overlapped pairs. You also have two options for setting an acceptable length for the merged results • Merged Length [ ] bp to [1000] bp • Merged Length [70] bp to [130] % of the longer read length You can select one or both options; however, if you select both options, then the data must meet both criteria to be included in the results. Note: The recommended value for the minimum number of bases that must overlap so that paired reads are correctly merged is nine. You can select a value that is less than nine, but this means that there is less overlap that is required between the paired reads, so your results might be less reliable. You can also select a value that is greater than nine, but an increased value requires more overlap for the reads to be merged, which might result in less paired reads being merged. See Merging Paired End Reads.
Save Score	Creates a .qual file that contains information about the number of reads that are used in each subgroup for condensation.