Factorized Construction of Machine Learning Methods over Normalized Data

Yu, XiaohuiZhang, Zhe2021-11-152021-11-152021-072021-11-15http://hdl.handle.net/10315/38755Enterprises are adopting machine learning to gain knowledge from the vast amount of data, which are normalized and stored in relational databases. All the features required in different relations must be combined through join operations and fed to machine learning processes. As a result, redundancy avoided by normalization is reintroduced, which incurs additional costs. This thesis proposes the factorized algorithms (F-GMM, F-NN and F-PPCA) for three widely used scenarios (GMM, NN and PPCA) in machine learning to eliminate the redundancy introduced by the joins. The training process can be conducted much faster without any loss in accuracy for the exact decomposition. The efficiency improvement depends on the relative redundancy of the original relations. Finally, we design extensive experiments on both synthetic and real datasets to evaluate the performance of the proposed algorithms by varying parameters of interest. The factorized method yields significant efficiency improvements, which increases with redundancy growth.Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.Computer scienceFactorized Construction of Machine Learning Methods over Normalized DataElectronic Thesis or Dissertation2021-11-15Machine learningNormalized dataJoin