Show simple item record

dc.contributor.advisorLitoiu, Marin
dc.creatorSandel, Roni
dc.description.abstractCloud computing is increasingly being adopted due to its cost savings and abilities to scale. As data continues to grow rapidly, an increasing amount of institutions are adopting non standard SQL clusters to address the storage and processing demands of large data. However, evaluating and modelling non SQL clusters presents many challenges. In order to address some of these challenges, this thesis proposes a methodology for designing and modelling large scale processing configurations that respond to the end user requirements. Firstly, goals are established for the big data cluster. In this thesis, we use performance and cost as our goals. Secondly, the data is transformed from relational data schema to an appropriate HBase schema. In the third step, we iteratively deploy different clusters. We then model the clusters and evaluate different topologies (size of instances, number of instances, number of clusters, etc.). We use HBase as the large data processing cluster and we evaluate our methodology on traffic data from a large city and on a distributed community cloud infrastructure.
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.titleAn Approach to Designing Clusters for Large Data Processing
dc.typeElectronic Thesis or Dissertationen_US Systems and Technology - Master of Arts's
dc.subject.keywordsBig data

Files in this item


This item appears in the following Collection(s)

Show simple item record

All items in the YorkSpace institutional repository are protected by copyright, with all rights reserved except where explicitly noted.