Factorized Construction of Machine Learning Methods over Normalized Data

dc.contributor.advisorYu, Xiaohui
dc.contributor.authorZhang, Zhe
dc.date.accessioned2021-11-15T15:36:53Z
dc.date.available2021-11-15T15:36:53Z
dc.date.copyright2021-07
dc.date.issued2021-11-15
dc.date.updated2021-11-15T15:36:53Z
dc.degree.disciplineInformation Systems and Technology
dc.degree.levelMaster's
dc.degree.nameMA - Master of Arts
dc.description.abstractEnterprises are adopting machine learning to gain knowledge from the vast amount of data, which are normalized and stored in relational databases. All the features required in different relations must be combined through join operations and fed to machine learning processes. As a result, redundancy avoided by normalization is reintroduced, which incurs additional costs. This thesis proposes the factorized algorithms (F-GMM, F-NN and F-PPCA) for three widely used scenarios (GMM, NN and PPCA) in machine learning to eliminate the redundancy introduced by the joins. The training process can be conducted much faster without any loss in accuracy for the exact decomposition. The efficiency improvement depends on the relative redundancy of the original relations. Finally, we design extensive experiments on both synthetic and real datasets to evaluate the performance of the proposed algorithms by varying parameters of interest. The factorized method yields significant efficiency improvements, which increases with redundancy growth.
dc.identifier.urihttp://hdl.handle.net/10315/38755
dc.languageen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectComputer science
dc.subject.keywordsMachine learning
dc.subject.keywordsNormalized data
dc.subject.keywordsJoin
dc.titleFactorized Construction of Machine Learning Methods over Normalized Data
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang_Zhe_2021_Masters.pdf
Size:
987.04 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.87 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.39 KB
Format:
Plain Text
Description: