![]() | The following procedure describes how to generate a new CNV Tool report. Optionally, you can click Load Settings to browse to and select a Settings file (.ini file) to generate the report based on the saved settings in the file. As you create a new report, at any time, you can click Default to return all values on all tabs to their default values. |
![]() | You can add up to 48 sample project (.pjt) files. If you use the Batch Add option, and the folder that you select contains more than 48 .pjt files, then an error message opens indicating this, and only the first 48 .pjt files in the folder are added. The remaining .pjt files are not added. |
![]() | You can add up to 24 control project (.pjt) files. If you use the Batch Add option, and the folder that you select contains more than 24 .pjt files, then an error message opens indicating this, and only the first 24 .pjt files in the folder are added. The remaining .pjt files are not added. |
Control | Description |
---|---|
Best Match | Select the single control project that has the best correlation to the sample project when comparing coverage in each region as the control project. Ignore the other projects. |
Average Controls | Use the average coverage in each region across all control projects as the control value. |
Median Controls | Use the median coverage in each region across all control projects as the control value. |
Option | Description |
---|---|
Use segments as defined in the reference files. | • CDS - Report coverage levels for each coding region. • Exon - Report coverage levels for each mRNA region. (Coding and non-coding exons.) • Continuous Exon - Report coverage levels for the entire mRNA for a gene, one region per gene. • Continuous CDS - Report coverage levels for the entire coding region for a gene, one region per gene. • ROI - Report coverage levels based on Regions of Interest that are defined in a GenBank reference file. Note: For information about defining Regions of Interest in a GenBank reference file, see Advanced GBK Editor tool.. |
Set incremental segment length | Specify the segment length, relative to either the reference positions in the contig or the chromosome positions. |
Input region of interest (*.bed) | You can upload a Region of Interest file in a BED format. |
Limit reporting to BED regions with specific descriptions | Optional. If you uploaded a Region of Interest file in a BED file format, then you can create a text file that lists the descriptions for selected regions of this file. (Each description must be its own line entry.) You can then upload this text file, and although all calculations are carried out on the whole BED file, only those regions of the uploaded BED file that have descriptions that match those contained in the text file are included in the CNV report. Note: Remember, BED file descriptions are optional, and if they are included in a BED file, then are located in Column 4. |
Exclude Chr X | Optional. Select this chromosome to be excluded from the comparison. |
Exclude Chr Y | Optional. Select this chromosome to be excluded from the comparison. |
Exclude Chr M | Optional. Select this chromosome to be excluded from the comparison. |
Fitting Method | Description |
---|---|
Note: If you make a change to any of the values that are listed below, then at any time, you can click Default to return all values on all tabs on the dialog box their default values. | |
Auto fitting | Selected by default. Automatic fitting is the recommended approach for large panels (thousands of regions/exons) and whole exome sequencing. With this method a line is automatically fit to the dispersion fitting points. Manual fitting is recommended for small targeted panels (< hundreds of regions/exons), especially if the data does not have a lot of noise. The number of points for automatic fitting should be sufficient enough to have one fitting point accurately reflect a sufficient number of raw data points. • If Custom fitting point number is not selected, then NextGENe automatically selects the appropriate number of points based on the regions. If Custom fitting point number is selected, then typically, the default value of 15 fitting points is acceptable for most data for large panels; however, if you have a small number of raw data points, then the rule of thumb is one fitting point for every 100 raw data points, so you can decrease this value as needed. For example, if your data has 375 regions, then you would set the number of points to three or four fitting points for Auto fitting. Even with a smaller number of regions, the number of points for Auto fitting should never be less than three. Note: Typically, even if you know that a manual fitting or a manual dispersion is the appropriate approach for your data, you should run an automatic fitting first, and then view the resulting data so that you have an idea of how to modify all the fitting settings for either method. |
Manual fitting | For Manual fitting, "a" and "b" represent the values for the line that is fit to the dispersion fitting points. These values are automatically populated after an Automatic fitting. You must modify these values for a Manual fitting. The Minimum Dispersion value is the minimum threshold for the dispersion of the data, regardless of the value that is set for “a.” As with Auto fitting, the number of points for manual fitting should be sufficient enough to have one fitting point accurately reflect a sufficient number of raw data points. • If Custom fitting point number is not selected, then NextGENe automatically selects the appropriate number of points based on the regions. If Custom fitting point number is selected, then typically, the default value of 15 fitting points is acceptable for most data for large panels; however, if you have a small number of raw data points, then, again, the rule of thumb is one fitting point for every 100 raw data points, so you can decrease this value as needed. |
Fixed dispersion value | Select this option to use a single dispersion value for all regions in lieu of fitting a line to all the dispersion points. As with the other fitting methods, the number of points for manual dispersion should be sufficient enough to have one fitting point accurately reflect a sufficient number of raw data points. • If Custom fitting point number is not selected, then NextGENe automatically selects the appropriate number of points based on the regions. If Custom fitting point number is selected, then typically, the default value of 15 fitting points is acceptable for most data for large panels; however, if you have a small number of raw data points, then, again, the rule of thumb is one fitting point for every 100 raw data points, so you can decrease this value as needed. Note: The Fixed dispersion option is useful for targeted panels where the dispersion (noise) is relatively low. • Auto-Detect: The manual dispersion value is automatically adjusted. This automatically chosen value works well in most cases, but you can modify this value as needed. You can select this value to be displayed in the CNV report. |
Setting | Description |
---|---|
Note: If you make a change to any of the values below, at any time, you can click Default to return all values on all tabs on the dialog box to their default values. | |
Minimum Normalized Read Counts | Applicable only if Normalized Counts is selected. Any regions where the total Normalized Read Counts fall below this value are labeled as Uncalled in the CNV Tool report. |
Minimum RPKM | Applicable only if RPKM is selected. Any regions where the total RPKM falls below this value are labeled as Uncalled in the CNV Tool report. |
Minimum region length | Minimum size of a region in base pairs for the region to be included in the CNV Tool report. |
Expected CNV Percentage [5.00]% | Indicates the percentage of regions in which CNV calls are expected to be made. Note: Typically, the default value of 5% is acceptable for most data. If the data is confident (not noisy), then increasing this value does not significantly increase the percentage of regions in which CNV calls are made. If the data is not confident (noisy), then increasing this value increases the percentage of regions in which CNV calls are made. |
Estimated sample purity | If the sample is mixed, or it has possible contamination, then enter an appropriate sample purity to adjust the calculations accordingly. |
Setting | Description |
---|---|
Note: If you make a change to any of the values below, at any time, you can click Default to return all values on all tabs on the dialog box to their default values. | |
Neighbor ratio settings | |
Perfect heterozygote SNP | Indicates the frequency requirements for perfect heterozygote SNP positions. Both the reference and variant allele must be found at frequency that is above the specified threshold, or the SNP is not used to determine the median coverage for the region. The default value is 40%, which means that any variant that is found at a frequency between 40% to 60% is considered to be a perfect heterozygote SNP. |
Smooth Log2Ratio | Selected by default. You can clear this option to omit the step of checking Neighbor Ratios. |
• High Resolution (3) | • Optimizes the detection sensitivity to call CNVs for smaller regions, such as CNVs that include only part of a gene. Considers three regions total - the region itself and one neighbor region on each side. |
• Low Resolution (41) | • Optimizes the detection to call larger CNVs, such as CNVs that include multiple genes or a whole chromosome. Considers 41 regions total - the region itself and 20 neighbor regions on each side. |
• Customized resolution | • Specify the number of regions that are to be considered for making the CNV call, where the number must reflect the region itself and the same number of neighbor regions on each side. |
Deletion and duplication calls using log2 ratio | |
• Auto • Manual | • NextGENe automatically makes the calls. • Manually define the range of log2ratio values for regions to be reported as Normal (default is -0.60 - 0.50). Regions with log2ratio values outside of this range are reported as Deletion or Duplication. |
Display limits | Available only if Manual is selected for the call option. Set the range that is displayed for the minimum and maximum values on the Log2ratio (Y) axis for the CNV graph relative to values defined for the Manual option. See CNV graphs. |
• Large block • Single point | • Detect and report on CNV regions that are > 3 regions. • Detect and report on CNVs regions that are single regions or larger. |
Setting | Description |
---|---|
Common Display Settings | |
Index | An ordered count of the segments that are used in the report. |
Chr • Name • Number | • The name of the chromosome that the segment is on. • The number of the chromosome that the segment is on. |
Chr Position Start | The base number that indicates where the segment starts in the chromosome. |
Chr Position End | The ending base number that indicates where the segment ends in the chromosome. |
Entrez Gene ID | The unique integer ID generated for the gene by Entrez Gene. |
Gene | The gene name for the segment when the segment is the whole gene or the name of the gene on which the segment is found. |
Exon | The exon number where the segment is found. This number includes non-coding exons. |
CDS | The coding sequence number for the segment. |
RNA Accession | The RNA accession for the gene from NCBI. |
Protein Accession | The protein accession for the gene from NCBI. |
Description | Available if the reference file is a .fasta file with multiple segments. Select this option to display the title line for each segment in the Description column. |
Contig | The contig on which the segment is located. The contig is based on the genome assembly from the NCBI. |
Locus Tag | An alternate way to identify the gene. |
Start | The starting location for the reference region. |
End | The ending location for the reference region. |
Length | The total length of the reference region, which provides for easy identification of expressed regions by size (such as when locating small RNA transcripts). |
Dispersion | The dispersion value for the region. N/A for Uncalled regions. |
Normalized Likelihoods | The normalized likelihood value for each potential CNV call (duplication, deletion, or normal). A likelihood value closer to zero indicates an increased likelihood for the call. |
Dispersion and HMM Display Settings, Normalized Counts selected | |
Normalized Read Count | The Normalized Read Counts for both the sample and the control. |
Ratio | The ratio of the sample RPKM to total RPKM for the region. |
Total Read Counts | The sum of the Sample read counts and the Control read counts. |
Dispersion and HMM Display Settings, RPKM selected | |
RPKM | Reads per Kilobase Exon Model per Million mapped reads. RPKM = 10^9 * R / (T*L) where: • R = Number of mapped reads in a region • T = Total number of mapped reads. • L = Length of the region. Normalizes the expression levels based on the length of the reference region and the total number of aligned reads. |
FPKM | Applicable only if the project used paired-end data. Fragments per Kilobase of exon per Million mapped reads. FPKM = 10^9 * F / (T*L) where: • F = Number of mapped fragments in a region and: • A “fragment” corresponds to a pair of reads. • Single reads are not counted. • The position of a fragment is the location between the two 5’ ends of the pairs. • T = Total number of mapped fragments. • L = Length of the region. Normalizes the expression levels for paired end data based on the length of the reference region and the total number of aligned reads. |
Ratio | The ratio of the sample RPKM to total RPKM for the region |
Total RPKM | The sum of the Sample RPKM and the Control RPKM. |
SNP-Based Normalization Display Settings | |
Position Selected | The chromosome position for the heterozygous SNP at which the median coverage value is obtained. |
Original Coverage | The un-normalized median coverage values for the region for the sample and control. |
Normalized Coverage | The median coverage following global normalization for the region for the sample and the control. |
Control Allele | Read count for the alleles at the Position Selected in the control project. If there are more than two alleles, then only the two most frequent alleles are reported. |
Sample Allele | Read count for the alleles at the Position Selected in the sample project. If there are more than two alleles, then only the two most frequent alleles are reported. |
Log2 Ratio | The Log2 of the ratio of the normalized coverages of the two sample files. |
Neighbor Ratios | The Log2 ratios for the current region followed by the Log2 ratios of the neighbor regions. |
Dispersion HMM | Select this option to include the Dispersion HMM analysis in the report results. Note: Neighbor Ratios must also be selected. |
Common Filter Settings | |
Display Deletion | Selected by default. Show CNVs that are classified as Deletions. Clear this option to hide this classification from the CNV Tool report. |
Display Normal | Selected by default. Show regions that are classified as Normal (little evidence of a CNV). Clear this option to hide this classification from the CVN Tool report. |
Display Duplication | Selected by default. Show CNVs that are classified as Duplications. Clear this option to hide this classification from the CNV Tool report. |
Display Uncalled | Selected by default. Show CNVs that are classified as Deletions. Clear this option to hide this classification from the CNV Tool report. |
In BED region | Not available if a BED was loaded on the Basic Settings tab. Filters the CNV Tool report to include only those regions that are contained in the BED file. Click Set to browse to and select the appropriate BED file. Note: A BED file is a tab-delimited text file. You can upload a BED file only if the reference sequence contains chromosome information, which means that the reference sequence must be either a preloaded reference that NextGENe supplies, or a GenBank reference file that contains chromosome information. |
HMM and Dispersion Filter Settings | |
Score | Filter the calls shown based on their respective scores. (Deletion, Normal, and Duplication.)The default value is 1.000, which means that all calls with a score > 1.000 are shown in the report. You can modify this value as needed. |
SNP-Based Normalization with smoothing Filter Settings | |
Log2 Ratio <= [0.700] or >= [-0.700} | Display only those regions where the Log2 of the ratio of the normalized coverages of the two sample files is above or below the set thresholds |
Scores >= [3.000] | Show only regions where the Phred-scaled score for at least one potential call (insertion, deletion, or normal) meets or exceeds the set threshold. |
Minimum Coverage At Least For One Project >= [5.00] | Default value is 30. At least one project (sample file) must contain at least the minimum read count in the selected regions, or the CNV calculations are not carried out for the region and the region is not included in the report. |
Show regions with low coverage | Include regions that have coverage that fall below the indicated minimum coverage in the report. N/A is displayed for the Log2 Ratio value for these regions. |
![]() | You can click Load Settings to select this Settings file at a later date and generate the report according to the saved settings in the file. |
Option | Description |
---|---|
Select a different sample to view in the report display | If you selected multiple samples for the report, then the report toolbar displays a Sample dropdown list of all the samples that were analyzed for the report. You can select a different sample on this list to update the report display accordingly. |
View the region of the genomic database in the Database of Genomic Variants (DGV) for which the call was made | Click the call type in the HMM Calls column. |
Load different projects and/or change the project settings | On the report toolbar, click the Load Projects icon ![]() |
Modify the CNV report settings | On the report toolbar, click the Settings icon ![]() |
Save the report to a text file | On the report toolbar, click the Save Report icon ![]() A default name (<project_name>_CNVReport and location (project folder) are provided for the file, but you can change both of these values. |
Generate the Gene CNV report | Applicable only for SNP-Based Normalization with smoothing. On the report toolbar, click the Gene CNV report icon ![]() |
Generate the Block CNV report | On the report toolbar, click the Block CNV report icon ![]() |
Generate the graphical display of the data | On the report toolbar, click the CNV Graphs icon ![]() |