Soft clipping based on quality scores
If the base call quality is less than or equal to the quality score that is specified in the Alignment Settings for a project, then the 3’ ends of reads are soft-clipped. If mismatched bases are found near the end of the alignment, then soft clipping is also carried out on the 3’ ends of reads according to the following:
1. Starting at the end of the alignment sequence, move towards the middle of the alignment sequence, with a quality score for each nucleotide in the sequence calculated as follows.
Add 1 for a matched base; otherwise, subtract 3 for a mismatched base. A quality score < -(6) is not allowed.
2. Continue calculating a quality score for each base in the alignment sequence until a nucleotide with a quality score of 6 is found.
3. Move back from this position towards the end of the read until a mismatch is found.
4. Soft clip from this mismatch through the end of the read.
For example, if the alignment results in a CIGAR string of 100=2X10=1X3=. then:
A score is calculated going back from the end: 0,1, 2, 3, 0,1, 2, 3, 4, 5, 6.
Moving back to the end, soft clipping is started at the first mismatch through the end, which results in four bases total being soft-clipped: 100=2X10=4S.