DSpace Repository

Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets

Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets

Show full item record

Title: Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets
Author: Zhang, Jian
Abstract: Research into the combination of data mining and machine learning technology with web-based education systems (known as education data mining, or EDM) is becoming imperative in order to enhance the quality of education by moving beyond traditional methods. With the worldwide growth of the Information Communication Technology (ICT), data are becoming available at a significantly large volume, with high velocity and extensive variety. In this thesis, four popular data mining methods are applied to Apache Spark, using large volumes of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The thesis convincingly presents useful strategies for allocating computing resources and tuning to take full advantage of the in-memory system of Apache Spark to conduct the tasks of data mining and machine learning. Moreover, it offers insights that education experts and data scientists can use to manage and improve the quality of education, as well as to analyze and discover hidden knowledge in the era of big data.
Subject: Educational technology
Keywords: Spark
Big data
Data mining
Apache Spark
Educational data mining
Type: Electronic Thesis or Dissertation
Rights: Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
URI: http://hdl.handle.net/10315/35023
Supervisor: Yang, Zijiang Cynthia
Degree: MA - Master of Arts
Program: Information Systems and Technology
Exam date: 2018-04-17
Publish on: 2018-08-27

Files in this item



This item appears in the following Collection(s)