To convert a sample file

NextGENe Online Help : The NextGENe Format Conversion Tool : The NextGENe Format Conversion Tool : To convert a sample file

Before you begin the file conversion process, review the information in the table below and make sure that you have correctly named your files or carried out any other needed preparation before you load them in to the NextGENe Format Conversion tool. In addition, before you convert the file, you can use the NextGENe File Preview tool to preview some basic information about the file, which can be helpful for determining settings for the File Conversion process. See The NextGENe File Preview Tool.

File Format	Comments
SEQ/PRB	The file names do not need to be identical, but they must be appended with the phrases “_seq” and “_prb” respectively. For example, SRR01842a_seq.txt and SRR01842c_prb.txt.
FASTQ (merged pairs)	Select this option for paired end files in FASTQ format that contain both reads in a pair in the same line in opposite orientation (Read 1 -> <- Read2). NextGENe converts these files by splitting each read in two. Two new files are created titled _1.fasta and _2.fasta with read names >/1 and >/2. The second half of the original read and the quality scores are reverse complemented. The file is then converted to .fasta format and quality filtering is implemented as with other FASTQ files.
• SCARF Numeric • SCARF ASCII	Caution: Make sure to choose the correct quality score format – either Numeric or ASCII.
• CFASTA	The SOLiD System instrument produces color space sequence reads in a .fasta format labeled as CSFASTA. If you select CFASTA as the input format type and FASTA as the output format type, then NextGENe converts the reads from color space to base space. Caution: Errors in color space can lead to the propagation of errors downstream within the read when converted to base-space, so SoftGenetics recommends that you leave the reads in color space. Note: You can select CSFASTA as the output format type to quality filter the CSFASTA files without conversion. If you select this option, the output file remains in color space. This option can be used to quality trim reads while maintaining color-space. This is the preferred conversion option for SOLiD System data. Note: You can quality trim reads using the .csfasta and .qual files only if the file names are identical, for example, SRR01842.cfasta and SRR01842_QV.qual.
FASTA	Select this option and choose CSFASTA as the output format type to convert .fasta files in base space into .csfasta files in color space.
Mate Pair SFF	Select this option for mate-pair files in SFF format that contain both reads in a pair in the same line. NextGENe converts these files by splitting each read in two. Two new files are created titled _1.fna and _2.fna with read names >/1 and >/2. The file is then converted to .fasta format and quality filtering is implemented as with other SFF files.
Mate Pair FASTQ	Select this option for mate-pair files in FASTQ format that contain both reads in a pair in the same line. NextGENe converts these files by splitting each read in two. Two new files are created titled _1.fna and _2.fna with read names >/1 and >/2. The file is then converted to .fasta format and quality filtering is implemented as with other FASTQ files.

1. Do one of the following:

• On the NextGENe main menu, click Tools > Format Conversion.

• In the Project Wizard, on the Load Data page, click Format Conversion.

The Format Conversion dialog box opens.

2. In the Instrument pane, select the instrument type.

3. In the Input pane, do the following:

a. Click Add to browse to and select the input data file.

After you load the file, NextGENe automatically selects the correct instrument/file type option in the Instrument pane.

b. On the Input format type dropdown list, select the input format type, for example, BAM.

4. In the Output pane, do the following:

a. On the Output format type dropdown list, select the output format type.

b. In the Output field, you can leave the default value for the location of the output files as is (the default value is the directory path for the last input data file that you selected), or you can click Set to select a different location.

5. Optionally, do one of the following:

• Click Load to browse to and select a Settings file (.ini file) to convert the files based on the saved settings in the file.

• In the Settings pane, click Default Settings to automatically select the quality settings that SoftGenetics has determined, from experience, are appropriate for the file type that is being converted.

• In the Settings pane, select the options for filtering and trimming low quality reads.

Option	Description
Median score threshold >= [20]	Selected by default. Removes entire reads from the sample file when the median quality score is below the specified threshold.
Max # of uncalled bases <= [3]	Selected by default. Remove entire reads from the sample file when the contains more N calls then specified.
Called base number of each read >= [25]	Selected by default. Trims low quality bases from reads when a consecutive number of bases (“x”) falls below the specified qualify score threshold (“y”). Note: If Trimming is also selected, then the called base number that is used for this function is the number of bases that remain after trimming.
Trim or reject read when >= [3] base(s) with score <= [16]	Selected by default. Trims low quality based from reds when a consecutive number of bases (“x”) falls below the specified quality score threshold (“y”). Note: For additional information about how this option works, see Trim or Reject Read While >= [x] Bases with Score <= [y].
Paired read data	Available only if paired read data is being analyzed. Select this option if you are converting a mate-paired or paired-end files. NextGENe uses a placeholder “N” for reads that are removed because of low quality, which is necessary to maintain mate-paired or paired-end read information.
Remove 5’ [2] bases and 3’ [4] bases	Trims the specified number of bases from the 5’ and/0r 3’ ends.
Keep only bases [0] to [0]	Trims the reads to keep the only the specified portion of the read.
Trim by sequences	Select this option to trim reads where the specified sequence occurs. See Trim by Sequences.
Trim by sequences in file	Select this option and then click the Browse button load a tab-delimited text file that contains the sequences by which the reads are to be trimmed. See Trim by Sequences in the File.
Custom linker	Applicable only for mate-paired Roche data or mate-paired Ion Torrent data where both pairs are located in the same read. Select this option if you used a custom linker. NextGENe automatically detects the standard linker sequences.

Even if you select the options by which to filter and trim low quality reads, at any time, you can click Default Settings to clear your options and replace them with the preset values from SoftGenetics.

6. Optionally, before you process the files, click Save to save the settings that you have specified to a Settings file (.ini file).

You can always load this file at a later date and process other data files according to the saved settings in the file.

7. Do one of the following:

• Click Add Job to save this job, and open another tab for a file conversion. Repeat this step to add all needed conversion jobs, and then click OK to run the jobs in the order in which you created them. The converted files are saved in the directory that you specified in Step 4.

• Click OK to immediately run this job. The converted file is saved in the directory that you specified in Step 4.