Bridging Data Management and Machine Learning: Case Studies on Index, Query Optimization, and Data Acquisition

Yu, XiaohuiLi, Yifan2022-12-142022-12-142022-06-292022-12-14http://hdl.handle.net/10315/40655Data management tasks and techniques can be observed in a variety of real world scenarios, including web search, business analysis, traffic scheduling, and advertising, to name a few. While data management as a research area has been studied for decades, recent breakthroughs in Machine Learning (ML) provide new perspectives to define and tackle problems in the area, and at the same time, the wisdom integrated in data management techniques also greatly helps to accelerate the advancement of Machine Learning. In this work, we focus on the intersection area of data management and Machine Learning, and study several important, interesting, and challenging problems. More specifically, our work mainly concentrates on the following three topics: (1) leveraging the ability of ML models in capturing data distribution to design lightweight and data-adaptive indexes and search algorithms to accelerate similarity search over large-scale data; (2) designing robust and trustworthy approaches to improve the reliability of both conventional query optimizer and learned query optimizer, and boost the performance of DBMS; (3) developing data management techniques with statistical guarantees to acquire the most useful training data for ML models with a budget limitation, striving to maximize the accuracy of the model. We conduct detailed theoretical and empirical study for each topic, establishing these fundamental problems as well as developing efficient and effective approaches for the tasks.Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.Computer scienceBridging Data Management and Machine Learning: Case Studies on Index, Query Optimization, and Data AcquisitionElectronic Thesis or Dissertation2022-12-14Data managementMachine learningIndexQuery optimizationData acquisition