Factorized Construction of Machine Learning Methods over Normalized Data

Zhang, Zhe

Factorized Construction of Machine Learning Methods over Normalized Data

Files

Zhang_Zhe_2021_Masters.pdf (987.04 KB)

Date

2021-11-15

Authors

Zhang, Zhe

Abstract

Enterprises are adopting machine learning to gain knowledge from the vast amount of data, which are normalized and stored in relational databases. All the features required in different relations must be combined through join operations and fed to machine learning processes. As a result, redundancy avoided by normalization is reintroduced, which incurs additional costs. This thesis proposes the factorized algorithms (F-GMM, F-NN and F-PPCA) for three widely used scenarios (GMM, NN and PPCA) in machine learning to eliminate the redundancy introduced by the joins. The training process can be conducted much faster without any loss in accuracy for the exact decomposition. The efficiency improvement depends on the relative redundancy of the original relations. Finally, we design extensive experiments on both synthetic and real datasets to evaluate the performance of the proposed algorithms by varying parameters of interest. The factorized method yields significant efficiency improvements, which increases with redundancy growth.

Keywords

Computer science

URI

http://hdl.handle.net/10315/38755

Collections

Information Systems and Technology

Full item page

Factorized Construction of Machine Learning Methods over Normalized Data

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections