YorkSpace has migrated to a new version of its software. Access our Help Resources to learn how to use the refreshed site. Contact diginit@yorku.ca if you have any questions about the migration.
 

Using Text Mining of PubMed Abstracts As An Evidence Source in Computational Predictions of WW Domain-Mediated Protein-Protein Interactions

Loading...
Thumbnail Image

Date

2016-09-20

Authors

Olhovsky, Marina

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Protein-protein interactions (PPIs) are a key regulatory mechanism in coordinating a multitude of processes vital to normal cellular function. There exist a number of wet-lab small-scale and high-throughput methods for accurately identifying PPIs; however, despite their accuracy, these methods are expensive both in terms of time and finances. Complementing experimental methods with computational predictions increases the effectiveness of wet-lab small scale methodologies in identifying high quality protein interaction networks. Computational predictions are made by applying bioinformatics and machine-learning algorithms to large-scale training sets obtained from wet-lab experiments, or by extracting information on PPIs from high volumes of published data that do not directly identify protein interactions but are nonetheless correlated with them. A disadvantage of computational predictions is their high degree of inaccuracy, namely too many false positives and false negatives. To improve the accuracy of computational predictions, it is important to consider interactions that are likely to occur in vivo under certain biological conditions, termed context. One technique for improving prediction accuracy is analyzing data obtained via different types of experiments that consider different features of the co-occurring proteins, such as co-localization, co-expression, correlated mutations, or semantic similarity. These experimental sources and their resulting data are called sources of evidence. Integrating data from multiple independent supporting evidence sources improves prediction accuracy.

In this work, I used text mining of PubMed abstracts as an evidence source for protein interactions. I hypothesized that proteins whose names are frequently mentioned in the same abstract are more likely to interact in vivo compared to randomly chosen proteins. A comparison of three text mining techniques gene name co-occurrence, MeSH term indexing, and co-occurrence with a controlled vocabulary shows that co-occurrence with a controlled vocabulary yields the highest precision and recall. I concluded that gene name co-occurrence with a controlled vocabulary can, therefore, be used as a novel evidence source for prediction of WW domain-mediated PPIs.

Description

Keywords

Bioinformatics

Citation

Collections