Some Aspects on Data Modelling

dc.contributor.advisorWu, Yuehua
dc.creatorSun, Xiaoying
dc.date.accessioned2018-03-01T13:41:56Z
dc.date.available2018-03-01T13:41:56Z
dc.date.copyright2017-04-03
dc.date.issued2018-03-01
dc.date.updated2018-03-01T13:41:56Z
dc.degree.disciplineMathematics & Statistics
dc.degree.levelDoctoral
dc.degree.namePhD - Doctor of Philosophy
dc.description.abstractStatistical methods are motivated by the desire of learning from data. Transaction dataset and time-ordered data sequence are commonly found in many research areas, such as finance, bioinformatics and text mining. In this dissertation, two problems regarding these two types of data: association rule mining from transaction data and structural change estimation in time-ordered sequence, are studied. Informative association rule mining is fundamental for knowledge discovery from transaction data, for which brute-force search algorithms, e.g., the well-known Apriori algorithm, were developed. However, operating these algorithms becomes computationally intractable in searching large rule space. A stochastic search framework is developed to tackle this challenge by imposing a probability distribution on the association rule space and using the idea of annealing Gibbs sampling. Large rule space of exponential order can still be randomly searched by this algorithm to generate a Markov chain of viable length. This chain contains the most informative rules with probability one. The stochastic search algorithm is flexible to incorporate any measure of interest. Moreover, it reduces computational complexities and large memory requirements. A time-ordered data sequence may contain some sudden changes at some time points, before and after which the data sequences follow different distributions or statistical models. Change point problems in generalized linear models and distributions of independent random variables are studied respectively. Firstly, to estimate multiple change points in generalized linear models, we convert it into a model selection problem. Then modern model selection techniques are applied to estimate the regression coefficients. A consistent estimator of the number of change points is developed, and an algorithm is provided to estimate the change points. Secondly, to estimate single change point in distributions of independent random variables, a change point estimator is proposed based on empirical characteristic functions. Its consistency is also established.
dc.identifier.urihttp://hdl.handle.net/10315/34237
dc.language.isoen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectStatistics
dc.subject.keywordsAssociation rule
dc.subject.keywordsGibbs sampling
dc.subject.keywordsTransaction data
dc.subject.keywordsGenomic data
dc.subject.keywordsMultiple change points
dc.subject.keywordsGLM
dc.subject.keywordsSIS
dc.subject.keywordsMCP
dc.subject.keywordsSegmentation
dc.subject.keywordsChange point estimator
dc.subject.keywordsEmpirical characteristic function
dc.titleSome Aspects on Data Modelling
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sun_Xiaoying_2017_PhD.pdf
Size:
670.22 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.38 KB
Format:
Plain Text
Description: