Introduction to the NextGENe Sensitive Aneuploidy Detection Tool
The NextGENe Sensitive Aneuploidy Detection tool (the NextGENe SAD tool) is designed for the detection of whole chromosome aneuploidies in low coverage whole genome sequencing samples. The process of using the tool to detect whole chromosome aneuploidies consists of the following:
1. Creating and training a model that is based on modeling control samples, where a modeling control sample is a negative control (no aneuploidies).
2. Optionally, testing the model with one or more test samples, which are positive controls (known aneuploidies) to estimate the sensitivity and specificity of the model.
3. Analyzing your samples based on the generated and trained model, and making aneuploidy calls based on this sample analysis.
By default, when you are creating and training a model, chromosomes 13, 18, and 21 are targeted for aneuploidy detection, which are the “industry standard” chromosomes for aneuploidy detection; however, you can select any combination of chromosomes (up to a maximum value of six) for aneuploidy detection. The tool can align raw data samples (.fastq, .fq, or ..fasta) before analyzing the samples, or it can analyze already aligned (.bam) samples. (If the files are BAM files, then they must be aligned to a Build 37 reference.)
At a high level, the algorithm on which the model is based compares ratio values for each targeted chromosome to a normal distribution of these ratio values in the control samples, resulting in normalized chromosome values (NCVs), which are similar to z-scores. Thresholds are defined for these NCVs, which you can then use to make aneuploidy calls for patient samples.
1. Record the total number of reads for all chromosomes (targeted and control).
2. Calculate the ratio, where:
Ratio = | Total # of reads in a targeted, non-sex chromosome Avg. total # of reads in a combination of non-targeted, non-sex chromosomes |
The algorithm iteratively determines the combination of non-targeted chromosomes that can provide an average number of total reads that is the most similar to the total number of reads for the targeted chromosome. This means that different targeted chromosomes can have a different combination of non-targeted chromosomes for the denominator in the Ratio calculation.
3. Calculate the NCVs for each patient sample, where:
NCV = | Patient Sample ratio - Average Mean Control Ratio Standard Deviation of Control Ratios |