YorkSpace has migrated to a new version of its software. Access our Help Resources to learn how to use the refreshed site. Contact diginit@yorku.ca if you have any questions about the migration.
 

Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets

dc.contributor.advisorYang, Zijiang Cynthia
dc.creatorZhang, Jian
dc.date.accessioned2018-08-27T16:42:44Z
dc.date.available2018-08-27T16:42:44Z
dc.date.copyright2018-04-17
dc.date.issued2018-08-27
dc.date.updated2018-08-27T16:42:44Z
dc.degree.disciplineInformation Systems and Technology
dc.degree.levelMaster's
dc.degree.nameMA - Master of Arts
dc.description.abstractResearch into the combination of data mining and machine learning technology with web-based education systems (known as education data mining, or EDM) is becoming imperative in order to enhance the quality of education by moving beyond traditional methods. With the worldwide growth of the Information Communication Technology (ICT), data are becoming available at a significantly large volume, with high velocity and extensive variety. In this thesis, four popular data mining methods are applied to Apache Spark, using large volumes of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The thesis convincingly presents useful strategies for allocating computing resources and tuning to take full advantage of the in-memory system of Apache Spark to conduct the tasks of data mining and machine learning. Moreover, it offers insights that education experts and data scientists can use to manage and improve the quality of education, as well as to analyze and discover hidden knowledge in the era of big data.
dc.identifier.urihttp://hdl.handle.net/10315/35023
dc.language.isoen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectEducational technology
dc.subject.keywordsSpark
dc.subject.keywordsBig data
dc.subject.keywordsData mining
dc.subject.keywordsApache Spark
dc.subject.keywordsEducational data mining
dc.titleExploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang_Jian_2018_MA.pdf
Size:
1.03 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.87 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.4 KB
Format:
Plain Text
Description: