An Approach to Designing Clusters for Large Data Processing

Litoiu, Marin2015-08-282015-08-282014-12-022015-08-28http://hdl.handle.net/10315/29952Cloud computing is increasingly being adopted due to its cost savings and abilities to scale. As data continues to grow rapidly, an increasing amount of institutions are adopting non standard SQL clusters to address the storage and processing demands of large data. However, evaluating and modelling non SQL clusters presents many challenges. In order to address some of these challenges, this thesis proposes a methodology for designing and modelling large scale processing configurations that respond to the end user requirements. Firstly, goals are established for the big data cluster. In this thesis, we use performance and cost as our goals. Secondly, the data is transformed from relational data schema to an appropriate HBase schema. In the third step, we iteratively deploy different clusters. We then model the clusters and evaluate different topologies (size of instances, number of instances, number of clusters, etc.). We use HBase as the large data processing cluster and we evaluate our methodology on traffic data from a large city and on a distributed community cloud infrastructure.enAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.An Approach to Designing Clusters for Large Data ProcessingElectronic Thesis or Dissertation2015-08-28CloudBig dataHbase