DSpace Repository

An Approach to Designing Clusters for Large Data Processing

An Approach to Designing Clusters for Large Data Processing

Show full item record

Title: An Approach to Designing Clusters for Large Data Processing
Author: Sandel, Roni
Abstract: Cloud computing is increasingly being adopted due to its cost savings and abilities to scale. As data continues to grow rapidly, an increasing amount of institutions are adopting non standard SQL clusters to address the storage and processing demands of large data. However, evaluating and modelling non SQL clusters presents many challenges. In order to address some of these challenges, this thesis proposes a methodology for designing and modelling large scale processing configurations that respond to the end user requirements. Firstly, goals are established for the big data cluster. In this thesis, we use performance and cost as our goals. Secondly, the data is transformed from relational data schema to an appropriate HBase schema. In the third step, we iteratively deploy different clusters. We then model the clusters and evaluate different topologies (size of instances, number of instances, number of clusters, etc.). We use HBase as the large data processing cluster and we evaluate our methodology on traffic data from a large city and on a distributed community cloud infrastructure.
Keywords: Cloud
Big data
Type: Electronic Thesis or Dissertation
Rights: Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
URI: http://hdl.handle.net/10315/29952
Supervisor: Litoiu, Marin
Degree: MA - Master of Arts
Program: Information Systems and Technology
Exam date: 2014-12-02
Publish on: 2015-08-28

Files in this item

This item appears in the following Collection(s)