YorkSpace has migrated to a new version of its software. Access our Help Resources to learn how to use the refreshed site. Contact diginit@yorku.ca if you have any questions about the migration.
 

High-Dimensional Data Integration with Multiple Heterogeneous and Outlier Contaminated Tasks

dc.contributor.advisorXu, Wei
dc.contributor.advisorGao, Xin
dc.contributor.authorZhong, Yuan
dc.date.accessioned2023-10-03T20:07:30Z
dc.date.available2023-10-03T20:07:30Z
dc.date.issued2023-02
dc.date.updated2023-10-03T20:07:29Z
dc.degree.disciplineMathematics & Statistics
dc.degree.levelDoctoral
dc.degree.namePhD - Doctor of Philosophy
dc.description.abstractData integration is the process of extracting information from multiple sources and analyzing different related data sets simultaneously. The aggregated information can reduce the sample biases caused by low-quality data, boost the statistical power for joint inference, and enhance the model prediction. Therefore, this dissertation focuses on the development and implementation of statistical methods for data integration. In clinical research, the study outcomes usually consist of various patients' information corresponding to the treatment. Since the joint inference across related data sets can provide more efficient estimates compared with marginal approaches, analyzing multiple clinical endpoints simultaneously can better understand treatment effects. Meanwhile, the data from different research are usually heterogeneous with continuous and discrete endpoints. To alleviate computational difficulties, we apply the pairwise composite likelihood method to analyze the data. We can show that the estimators are consistent and asymptotically normally distributed based on the Godambe information. Under high dimensionality, the joint model needs to select the important features to analyze the intrinsic relatedness among all data sets. The multi-task feature learning is widely used to recover this union support through the penalized M-estimation framework. However, the heterogeneity among different data sets may cause difficulties in formulating the joint model. Thus, we propose the mixed $\ell_{2,1}$ regularized composite quasi-likelihood function to perform multi-task feature learning. In our framework, we relax the distributional assumption of responses, and our result establishes the sign recovery consistency and estimation error bounds of the penalized estimates. When data from multiple sources are contaminated by large outliers, the multi-task learning methods suffer efficiency loss. Next, we propose robust multi-task feature learning by combining the adaptive Huber regression tasks with mixed regularization. The robustification parameters can be chosen to adapt to the sample size, model dimension, and error moments while striking a balance between unbiasedness and robustness. We consider heavy-tailed distributions for multiple data sets that have bounded $(1+\omega)$th moment for any $\omega>0$. Our method is shown to achieve estimation consistency and sign recovery consistency. In addition, the robust information criterion can conduct joint inference on related tasks for consistent model selection.
dc.identifier.urihttps://hdl.handle.net/10315/41453
dc.languageen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectStatistics
dc.subject.keywordsData integration
dc.subject.keywordsComposite likelihood
dc.subject.keywordsPenalized M-estimation
dc.subject.keywordsRobust M-estimation
dc.subject.keywordsMixed â„“2,1 Regularization
dc.subject.keywordsAdaptive Huber regression
dc.subject.keywordsOutlier contamination
dc.titleHigh-Dimensional Data Integration with Multiple Heterogeneous and Outlier Contaminated Tasks
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhong_Yuan_2023_PhD.pdf
Size:
1.2 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.87 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.39 KB
Format:
Plain Text
Description: