Using Unique Molecular Identifiers (UMIs) to Remove PCR Duplicates provides increased accuracy

NextGENe software’s Sequence Operation Tool includes the ability to remove PCR duplicates using unique molecular identifiers (UMIs) such as those used by NEBNext Direct or HaloPlex chemistries. The removal of PCR duplicates provides increased accuracy in determining allele frequencies for improved variant calling. The UMIs are random sequences of bases that are used to tag each molecule (fragment) of DNA prior to library amplification, which aids in the identification of PCR duplicates. Illumina instruments generate an I2 Index File for paired-end runs, and an I2 Index file can store these UMIs.

The tool uses UMIs within the Illumina I2 files to identify the PCR duplicates. NextGENe software identifies all the read pairs that share the same UMI, verifies that the template is a duplicate, and retains only the pair that has the highest total quality to be processed along with unique paired reads, while duplicate reads are removed from further processing.

Figure 1: NextGENe’s Sequence Operation Tool can be used to remove PCR duplicates using UMIs from an I2 index file.

Automation of the duplicate removal using UMIs is easily accomplished by using the NextGENe AutoRun Tool to create a template containing the required analysis specifications.

The duplicate removal settings can be saved by clicking "Save" in the Sequence Operations Tool as shown in Figure 1 above. This settings file can then be loaded in the NextGENe AutoRun Tool’s Job Editor by clicking "Add" next to "Preprocesses". After also loading settings files for any other preprocessing steps needed, such as format conversion and adapter trimming, and for the Process & report settings, a template for the settings can be saved by clicking the "Save" button next to the "Template" field.

Figure 2: The NextGENe AutoRun Tool can be used to 1) add preprocessing steps including removal of PCR duplicates, 2) select the settings for processing of the sample, and 3) create a template to facilitate quick and easy project set-up for batch analysis.

Projects can then be configured using the template by simply selecting the template and loading the sample files.

Application Notes:


Pricing & Trial Version:

Reference Material:

Trademarks property of their respective owner