Information Systems and Technology
Permanent URI for this collection
Browse
Browsing Information Systems and Technology by Subject "Big data"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Open Access A Hierarchical Rule-Based Security Management System for Date-Intensive Applications(2018-11-21) Rouf, Yar Akhter; Litoiu, MarinApplications in today's software development environment evolve at a rapid rate, constantly providing their users with new functionalities. As a result, it becomes increasingly complex to understand the entire application. The security team and the developers may not completely understand each others approaches, resulting in a less secure system with vulnerabilities. In addition, there is large amount of security data to be analyzed. To mitigate these issues, we propose a platform to support the SecDevOps framework, a hierarchical distributed architecture for security control that uses a Business Rules Engine (BRE). The BRE simplifies security rules by allowing the teams to write them at an operational level rather than at the network level, which requires specialized knowledge. Business rules are universally understood by the different teams, resulting in effective inter-team communication. Additionally, the platform can expand and scale with new security rules and data sources at runtime in a systematic manner.Item Open Access An Approach to Designing Clusters for Large Data Processing(2015-08-28) Sandel, Roni; Litoiu, MarinCloud computing is increasingly being adopted due to its cost savings and abilities to scale. As data continues to grow rapidly, an increasing amount of institutions are adopting non standard SQL clusters to address the storage and processing demands of large data. However, evaluating and modelling non SQL clusters presents many challenges. In order to address some of these challenges, this thesis proposes a methodology for designing and modelling large scale processing configurations that respond to the end user requirements. Firstly, goals are established for the big data cluster. In this thesis, we use performance and cost as our goals. Secondly, the data is transformed from relational data schema to an appropriate HBase schema. In the third step, we iteratively deploy different clusters. We then model the clusters and evaluate different topologies (size of instances, number of instances, number of clusters, etc.). We use HBase as the large data processing cluster and we evaluate our methodology on traffic data from a large city and on a distributed community cloud infrastructure.Item Open Access Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets(2018-08-27) Zhang, Jian; Yang, Zijiang CynthiaResearch into the combination of data mining and machine learning technology with web-based education systems (known as education data mining, or EDM) is becoming imperative in order to enhance the quality of education by moving beyond traditional methods. With the worldwide growth of the Information Communication Technology (ICT), data are becoming available at a significantly large volume, with high velocity and extensive variety. In this thesis, four popular data mining methods are applied to Apache Spark, using large volumes of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The thesis convincingly presents useful strategies for allocating computing resources and tuning to take full advantage of the in-memory system of Apache Spark to conduct the tasks of data mining and machine learning. Moreover, it offers insights that education experts and data scientists can use to manage and improve the quality of education, as well as to analyze and discover hidden knowledge in the era of big data.