Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets

Zhang, Jian

Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets

Files

Zhang_Jian_2018_MA.pdf (1.03 MB)

Date

2018-08-27

Authors

Zhang, Jian

Abstract

Research into the combination of data mining and machine learning technology with web-based education systems (known as education data mining, or EDM) is becoming imperative in order to enhance the quality of education by moving beyond traditional methods. With the worldwide growth of the Information Communication Technology (ICT), data are becoming available at a significantly large volume, with high velocity and extensive variety. In this thesis, four popular data mining methods are applied to Apache Spark, using large volumes of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The thesis convincingly presents useful strategies for allocating computing resources and tuning to take full advantage of the in-memory system of Apache Spark to conduct the tasks of data mining and machine learning. Moreover, it offers insights that education experts and data scientists can use to manage and improve the quality of education, as well as to analyze and discover hidden knowledge in the era of big data.

Keywords

Educational technology

URI

http://hdl.handle.net/10315/35023

Collections

Information Systems and Technology

Full item page

Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections