YorkSpace has migrated to a new version of its software. Access our Help Resources to learn how to use the refreshed site. Contact diginit@yorku.ca if you have any questions about the migration.
 

Using Text Mining of PubMed Abstracts As An Evidence Source in Computational Predictions of WW Domain-Mediated Protein-Protein Interactions

dc.contributor.advisorPearlman, Ronald E
dc.creatorOlhovsky, Marina
dc.date.accessioned2016-09-20T16:22:54Z
dc.date.available2016-09-20T16:22:54Z
dc.date.copyright2015-08-25
dc.date.issued2016-09-20
dc.date.updated2016-09-20T16:22:54Z
dc.degree.disciplineBiology
dc.degree.levelMaster's
dc.degree.nameMSc - Master of Science
dc.description.abstractProtein-protein interactions (PPIs) are a key regulatory mechanism in coordinating a multitude of processes vital to normal cellular function. There exist a number of wet-lab small-scale and high-throughput methods for accurately identifying PPIs; however, despite their accuracy, these methods are expensive both in terms of time and finances. Complementing experimental methods with computational predictions increases the effectiveness of wet-lab small scale methodologies in identifying high quality protein interaction networks. Computational predictions are made by applying bioinformatics and machine-learning algorithms to large-scale training sets obtained from wet-lab experiments, or by extracting information on PPIs from high volumes of published data that do not directly identify protein interactions but are nonetheless correlated with them. A disadvantage of computational predictions is their high degree of inaccuracy, namely too many false positives and false negatives. To improve the accuracy of computational predictions, it is important to consider interactions that are likely to occur in vivo under certain biological conditions, termed context. One technique for improving prediction accuracy is analyzing data obtained via different types of experiments that consider different features of the co-occurring proteins, such as co-localization, co-expression, correlated mutations, or semantic similarity. These experimental sources and their resulting data are called sources of evidence. Integrating data from multiple independent supporting evidence sources improves prediction accuracy. In this work, I used text mining of PubMed abstracts as an evidence source for protein interactions. I hypothesized that proteins whose names are frequently mentioned in the same abstract are more likely to interact in vivo compared to randomly chosen proteins. A comparison of three text mining techniques gene name co-occurrence, MeSH term indexing, and co-occurrence with a controlled vocabulary shows that co-occurrence with a controlled vocabulary yields the highest precision and recall. I concluded that gene name co-occurrence with a controlled vocabulary can, therefore, be used as a novel evidence source for prediction of WW domain-mediated PPIs.
dc.identifier.urihttp://hdl.handle.net/10315/32083
dc.language.isoen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectBioinformatics
dc.subject.keywordsText mining
dc.subject.keywordsProtein interaction
dc.subject.keywordsPython
dc.subject.keywordsPrecision
dc.subject.keywordsRecall
dc.subject.keywordsWW domain
dc.titleUsing Text Mining of PubMed Abstracts As An Evidence Source in Computational Predictions of WW Domain-Mediated Protein-Protein Interactions
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Olhovsky_Marina_2015_Masters.pdf
Size:
8.84 MB
Format:
Adobe Portable Document Format
Description:
Loading...
Thumbnail Image
Name:
Appendix_J_Database_diagram.png
Size:
225 KB
Format:
Portable Network Graphics
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.38 KB
Format:
Plain Text
Description:

Collections