Statistical Modeling for Complex Data

dc.contributor.advisorWu, Yuehua
dc.contributor.advisorFu, Yuejiao
dc.contributor.authorSun, Bin
dc.date.accessioned2020-05-11T12:50:50Z
dc.date.available2020-05-11T12:50:50Z
dc.date.copyright2019-11
dc.date.issued2020-05-11
dc.date.updated2020-05-11T12:50:50Z
dc.degree.disciplineMathematics & Statistics
dc.degree.levelDoctoral
dc.degree.namePhD - Doctor of Philosophy
dc.description.abstractIn this dissertation, we focus on statistical modeling techniques for exploring complex data with features such as high dimensionality, nonstationary structure, heavy-tailed distributions, missing data, etc. We study four problems: dimension reduction in high-dimensional data, clarifying complex patterns in nonstationary spatial data, improving hierarchical Bayesian modeling of spatio-temporal data with staircase pattern of missing observations, and detecting change points in spatio-temporal data with outliers and heavy-tailed observations. Sufficient dimension reduction draws a lot of attention in the last twenty years due to the largely increasing dimensions of the covariates. The semiparametric approach to dimension reduction proposed by Ma and Zhu [2012] is a novel and completely different approach to dimension-reduction problems from the existing literature. We present a theoretical result that relaxes a critical condition required by the semiparametric approach. The asymptotic normality of the estimators still maintains under weaker assumptions. This improvement increases the applicability of the semiparametric approach. For spatial data, nonstationarity brings difficulties to learn the underlying processes, more specifically, to find spatial dependency using the semivariogram model. We improve the modeling technique through dimension expansion proposed by Bornn et al. [2012] by considering the correlation structure. We propose two generalized least-squares methods. Both of the methods provide more accurate parameter estimations than the least-squares method, which has been demonstrated through simulation studies and real data analyses. As spatio-temporal data are usually observed over a large area and in many years, modeling spatio-temporal data is non-trivial. Missing data makes the task even more challenging. One of the problems discussed in this dissertation is to model ozone concentrations in a region in the presence of missing data. We propose a method without assumptions on the correlation structure to estimate the covariance matrix through dimension expansion method for modeling the semivariograms in nonstationary fields based on the estimations from the hierarchical Bayesian spatio-temporal modeling technique [Le and Zidek, 2006]. For demonstration, we apply the method in ozone concentrations at 25 stations in the Pittsburgh region studied in Jin et al. [2012]. The comparison of the proposed method and the one in Jin et al. [2012] are provided through leave-one-out cross-validation which shows that the proposed method is more general and applicable. The last problem which is also related to spatio-temporal data is to detect structural changes for spatio-temporal data with missing in the presence of outliers and heavy-tailed observations. We improve the estimation algorithm of a general spatio-temporal autoregressive (GSTAR) model proposed by Wu et al. [2017]. We propose M-estimation-based EM algorithm and change-point detection procedure. Through data examples, we compare the proposed algorithm and the proposed change-point detection procedure with the existing ones and show that our method provides more robust estimation and is more accurate in detecting change points in the presence of outliers and/or heavy-tailed observations.
dc.identifier.urihttps://hdl.handle.net/10315/37444
dc.languageen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectStatistics
dc.subject.keywordsSufficient dimension reduction
dc.subject.keywordsGeneralized least-squares
dc.subject.keywordsDimension expansion
dc.subject.keywordsNonstationary modeling
dc.subject.keywordsSpatial statistics
dc.subject.keywordsChange point detection
dc.titleStatistical Modeling for Complex Data
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sun_Bin_2019_PhD.pdf
Size:
4.18 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.36 KB
Format:
Plain Text
Description: