Bridging Data Management and Machine Learning: Case Studies on Index, Query Optimization, and Data Acquisition

dc.contributor.advisorYu, Xiaohui
dc.contributor.authorLi, Yifan
dc.date.accessioned2022-12-14T16:26:21Z
dc.date.available2022-12-14T16:26:21Z
dc.date.copyright2022-06-29
dc.date.issued2022-12-14
dc.date.updated2022-12-14T16:26:21Z
dc.degree.disciplineElectrical Engineering & Computer Science
dc.degree.levelDoctoral
dc.degree.namePhD - Doctor of Philosophy
dc.description.abstractData management tasks and techniques can be observed in a variety of real world scenarios, including web search, business analysis, traffic scheduling, and advertising, to name a few. While data management as a research area has been studied for decades, recent breakthroughs in Machine Learning (ML) provide new perspectives to define and tackle problems in the area, and at the same time, the wisdom integrated in data management techniques also greatly helps to accelerate the advancement of Machine Learning. In this work, we focus on the intersection area of data management and Machine Learning, and study several important, interesting, and challenging problems. More specifically, our work mainly concentrates on the following three topics: (1) leveraging the ability of ML models in capturing data distribution to design lightweight and data-adaptive indexes and search algorithms to accelerate similarity search over large-scale data; (2) designing robust and trustworthy approaches to improve the reliability of both conventional query optimizer and learned query optimizer, and boost the performance of DBMS; (3) developing data management techniques with statistical guarantees to acquire the most useful training data for ML models with a budget limitation, striving to maximize the accuracy of the model. We conduct detailed theoretical and empirical study for each topic, establishing these fundamental problems as well as developing efficient and effective approaches for the tasks.
dc.identifier.urihttp://hdl.handle.net/10315/40655
dc.languageen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectComputer science
dc.subject.keywordsData management
dc.subject.keywordsMachine learning
dc.subject.keywordsIndex
dc.subject.keywordsQuery optimization
dc.subject.keywordsData acquisition
dc.titleBridging Data Management and Machine Learning: Case Studies on Index, Query Optimization, and Data Acquisition
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Li_Yifan_2022_PhD.pdf
Size:
1.63 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.87 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.39 KB
Format:
Plain Text
Description: