Statistical Methods for Complex and/or High Dimensional Data

Qin, Shanshan

Statistical Methods for Complex and/or High Dimensional Data

dc.contributor.advisor	Wu, Yuehua
dc.contributor.author	Qin, Shanshan
dc.date.accessioned	2020-08-11T12:41:29Z
dc.date.available	2020-08-11T12:41:29Z
dc.date.copyright	2020-04
dc.date.issued	2020-08-11
dc.date.updated	2020-08-11T12:41:28Z
dc.degree.discipline	Mathematics & Statistics
dc.degree.level	Doctoral
dc.degree.name	PhD - Doctor of Philosophy
dc.description.abstract	This dissertation focuses on the development and implementation of statistical methods for high-dimensional and/or complex data, with an emphasis on $p$, the number of explanatory variables, larger than $n$, the number of observations, the ratio of $p/n$ tending to a finite number, and data with outlier observations. First, we propose a non-negative feature selection and/or feature grouping (nnFSG) method. It deals with a general series of sign-constrained high-dimensional regression problems, which allows the regression coefficients to carry a structure of disjoint homogeneity, including sparsity as a special case. To solve the resulting non-convex optimization problem, we provide an algorithm that incorporates the difference of convex programming, augmented Lagrange and coordinate descent methods. Furthermore, we show that the aforementioned nnFSG method recovers the oracle estimate consistently, and yields a bound on the mean squared errors (MSE).} Besides, we examine the performance of our method by using finite sample simulations and a real protein mass spectrum dataset. Next, we consider a High-dimensional multivariate ridge regression model under the regime where both $p$ and $n$ are large enough with $p/n \rightarrow \kappa (0<\kappa<\infty)$. On top of that, by using a double leave-one-out method, we develop a nonlinear system of two deterministic equations that characterize the behaviour of M-estimate. Meanwhile, the theoretical results have been confirmed by simulations. Ultimately, we present matching quantiles M-estimation (MQME), a novel method establishing the relationship between the target response variable and the explanatory variables. MQME extends the matching quantiles estimation (MQE) method to a more general one by replacing the ordinary least-squares (OLS) estimation with an M-estimation, the latter being resistant to outlier observations of the target response. In addition, MQME is combined with an adaptive Lasso penalty so it can select informative variables. We also propose an iterative algorithm to compute the MQME estimate, the consistency of which has been proved, as is the MQE. Numerical experiments on simulated and real datasets demonstrate the efficient performance of our method.
dc.identifier.uri	http://hdl.handle.net/10315/37706
dc.language	en
dc.rights	Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subject	Statistics
dc.subject.keywords	coordinate descent
dc.subject.keywords	cross-validation
dc.subject.keywords	difference convex programming
dc.subject.keywords	double leave-one-out
dc.subject.keywords	feature grouping
dc.subject.keywords	feature selection
dc.subject.keywords	high-dimensional
dc.subject.keywords	matching quantiles
dc.subject.keywords	M-estimation
dc.subject.keywords	multivariate regression
dc.subject.keywords	non-negative constraint
dc.subject.keywords	outlier observations
dc.subject.keywords	regularization
dc.title	Statistical Methods for Complex and/or High Dimensional Data
dc.type	Electronic Thesis or Dissertation

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Qin_Shanshan_2020_PhD.pdf
Size:: 3.69 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 2 of 2

Name:: license.txt
Size:: 1.83 KB
Format:: Plain Text
Description:

Download

Name:: YorkU_ETDlicense.txt
Size:: 3.36 KB
Format:: Plain Text
Description:

Download

Collections

Mathematics & Statistics