NextGENe Software’s Sequence Operation Tool includes the ability to remove PCR duplicates using unique molecular identifiers (UMIs) such as those used by NEBNext Direct or HaloPlex chemistries. The removal of PCR duplicates provides increased accuracy in determining allele frequencies for improved variant calling. The UMIs are random sequences of bases that are used to tag each molecule (fragment) of DNA prior to library amplification, which aids in the identification of PCR duplicates. Illumina instruments generate an I2 Index File for paired-end runs, and an I2 Index file can store these UMIs.
The tool uses UMIs within the Illumina I2 files to identify the PCR duplicates. NextGENe identifies all the read pairs that share the same UMI and retains only the pair that has the highest total quality to be processed along with unique paired reads, while duplicate reads are removed from further processing.
Figure 1: NextGENe’s Sequence Operation Tool can be used to remove PCR duplicates using UMIs from an I2 index file.
Automation of the duplicate removal using UMIs is easily accomplished by using the NextGENe AutoRun Tool to create a template containing the required analysis specifications.