Assessment of 13 in silico pathogenicity methods on cancer-related variants
Highlights
- Pathogenicity prediction frequencies of the variants with unknown significance (VUS) are close to the variants with known clinical significance.
- MutPred, MetaSVM and Revel have the highest discriminatory power among 13 in silico prediction tools in the analysis of cancer-related variants.
- Ensemble-supervised learning methods have superiority over sequence-structure based methods.
Abstract
Single nucleotide variants (SNVs) are single base substitutions that could influence many biological functions in the cell including gene expression, protein folding, and protein-protein interactions among many others. Thus, predictions of functional effects of cancer-related variants are crucial for drug responses and treatment options in clinical oncology. Experimental identification of these effects could be slow, inefficient, and inconvenient, hence in silico methods are gaining popularity in predicting the variants' effects. There are many studies on the cancer variants, however, up to date, none of these have been aimed to assess the performance metrics of in silico pathogenicity methods on functional relevance of cancer variants obtained from ClinVar. To this end, we examined the pathogenicity predictions of cancer-related variant datasets of 8 cancer types (bladder, breast, colon, colorectal, kidney, liver, lung, and pancreas cancer) retrieved from ClinVar using 13 different in silico methods including SIFT, CADD, FATHMM-weighted, FATHMM-unweighted, GERP++, MetaSVM, Mutation Assessor, MutationTaster, MutPred, PolyPhen-2, Provean, Revel and VEST4. A combination of statistical performance metric analysis, prediction distribution frequency data and ROC curve analysis results have suggested that; among all in silico prediction tools, top three tools with the highest discriminatory power were found to be MutPred (AUC = 0.677), MetaSVM (AUC = 0.645) and Revel (AUC = 0.637).