Statistical Methods for Complex and/or High Dimensional Data
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This dissertation focuses on the development and implementation of statistical methods for high-dimensional and/or complex data, with an emphasis on
First, we propose a non-negative feature selection and/or feature grouping (nnFSG) method. It deals with a general series of sign-constrained high-dimensional regression problems, which allows the regression coefficients to carry a structure of disjoint homogeneity, including sparsity as a special case. To solve the resulting non-convex optimization problem, we provide an algorithm that incorporates the difference of convex programming, augmented Lagrange and coordinate descent methods. Furthermore, we show that the aforementioned nnFSG method recovers the oracle estimate consistently, and yields a bound on the mean squared errors (MSE).} Besides, we examine the performance of our method by using finite sample simulations and a real protein mass spectrum dataset.
Next, we consider a High-dimensional multivariate ridge regression model under the regime where both
Ultimately, we present matching quantiles M-estimation (MQME), a novel method establishing the relationship between the target response variable and the explanatory variables. MQME extends the matching quantiles estimation (MQE) method to a more general one by replacing the ordinary least-squares (OLS) estimation with an M-estimation, the latter being resistant to outlier observations of the target response. In addition, MQME is combined with an adaptive Lasso penalty so it can select informative variables. We also propose an iterative algorithm to compute the MQME estimate, the consistency of which has been proved, as is the MQE. Numerical experiments on simulated and real datasets demonstrate the efficient performance of our method.