Discovery and Effective Use of Frequent Item-set Mining and Association Rules in Datasets
MetadataShow full item record
The unprecedented rise in digitized data generation has led to the ever-expanding demand for sophisticated storage and analysis methods capable of handling vast amounts of complex data, much of which is stored within many databases. Owing to the large size of such databases, employment of sophisticated analysis methods, such as data mining and machine learning, becomes necessary to extract useful insights regarding a given system under study. Frequent itemset mining and association rules mining represent two key approaches to mining knowledge stored in databases. However, handling of large databases often leads to time-consuming calculations that necessitate large amounts of memory. In this regard, the development of methods capable of enabling faster, less laborious search or pattern discovery remains a central focus in the field of data mining. Incontestably, such methods could aid in faster processing and knowledge extraction, enabling new breakthroughs in how knowledge is acquired from data and applied in real-world applications. However, real-world applications are often hindered by limitations inherent to currently available algorithms. For instance, many itemset mining algorithms are known to first store a given database as a tree structure in memory. However, such algorithms fail to provide a tight upper bound on the number of nodes that will be generated during the tree building process accordingly, there are no upper bounds governing the amount of memory that is needed to generate such trees. As such, practical implementation of frequent itemset mining algorithms is often restricted by memory consumption. However, despite the importance of memory consumption in the applicability of itemset mining, this factor has not drawn adequate attention from the data mining community and remains as a key challenge in its application. In addition, the majority of algorithms widely used and studied to date are known to require multiple database scans, a factor which restricts their applicability for incremental mining applications. In this regard, the development of an algorithm capable of dynamically mining frequent patterns on-the-fly would open new pathways in data mining, enabling the application of itemset mining methods to new real-world applications, in addition to vastly improving current applications. In this thesis, different approaches are proposed in relation to the above-mentioned limitations currently hampering further progress in this significant area of data mining. First, an upper bound on the number of nodes of well-known tree structures in frequent itemset mining is presented. Second, aiming to overcome the memory consumption constraint, a memory-efficient method to store data processed by the frequent itemset mining algorithm is proposed, where instead of a tree, data is stored in a compact directed graph whose nodes represent items. Third, an algorithm is proposed to overcome costly databases scans in the form of a novel SPFP-tree (single pass frequent pattern tree) algorithm. Lastly, approaches that allow for frequent itemset and association rules to be practically and effectively used in real world applications are proposed. First, the quality and effectiveness of frequent itemset mining in solving a real world facility management problem is examined. Second, with aims of improving the quality of recommendations made to users, as well as to overcome the cold-start problem suffered by new users, a hybrid approach is herein proposed for the application of association rules into recommender systems.