Benchmarking of sequencing analysis
Advancements in next-generation sequencing (NGS) technologies have made it possible to quickly and affordably identify the genetic profiles of living organisms. As these technologies proliferate in research labs and hospitals, establishing their reproducibility becomes imperative. Reproducibility can only be established by a thorough evaluation of detected variants using numerous criteria, as these analyses are susceptible to various influences. While practical decisions like trimming, recalibration, duplicate handling, and computer settings have a significant impact, most efforts in reproducibility assessment for computational analyses concentrate on comparing various mapping and/or variant detection strategies.
In this project, we customize the NGS pipelines by changing the presence or preferences of analysis steps such as trimming, duplicate handling strategies, etc. to carry out the stability of the NGS pipelines. We used the raw WES data produced by the SEQC2 group and tested the stability of the results with a ground truth variant list produced by the same group. We also tested the effect of the operating systems, the architecture of the computer (CPU and RAM), the tool versions that were involved in the pipelines and also the running times of each step. The project keeps going.