Bayesian Methods For Data Integration And High Dimensional Linear Model With Non-Sparsity

Zhang, Guan-Lin

Bayesian Methods For Data Integration And High Dimensional Linear Model With Non-Sparsity

dc.contributor.advisor	Xin Gao
dc.contributor.author	Zhang, Guan-Lin
dc.date.accessioned	2025-07-23T15:10:19Z
dc.date.available	2025-07-23T15:10:19Z
dc.date.copyright	2025-03-07
dc.date.issued	2025-07-23
dc.date.updated	2025-07-23T15:10:18Z
dc.degree.discipline	Mathematics & Statistics
dc.degree.level	Doctoral
dc.degree.name	PhD - Doctor of Philosophy
dc.description.abstract	We address data integration where correlated data are collected across multiple platforms, modeling responses and predictors linearly. We extend this framework by incorporating random errors from sub-Gaussian and sub-exponential distributions. The goal is to identify key predictors across platforms, even as the number of predictors and observations grows indefinitely. Our approach combines marginal response densities from multiple platforms into a composite likelihood and introduces a Bayesian model selection criterion. Under regularity conditions, this criterion consistently selects the true model, even with a diverging model size. When true models differ across platforms, our method recovers the union support of predictors—those relevant in at least one platform. We implement a Monte Carlo Markov Chain (MCMC) algorithm for model selection. Simulations show that integrating multiple platforms improves model selection accuracy. Applied to financial data, our method combines information from three indices, identifying key predictors and yielding a more accurate predictive model with lower mean squared error than single-source models. In high-dimensional regression, sparsity assumptions on regression coefficients often fail when most coefficients are nonzero, causing bias. To address this, we propose Bayesian Grouping-Gibbs Sampling (BGGS), which partitions coefficients into 𝑘 groups, enabling efficient high-dimensional sampling. We explore 𝑘-selection via simulations and recommend an "elbow plot" for optimal determination. Theoretical analysis ensures model selection consistency and bounded prediction error. Numerical experiments confirm BGGS’s advantage in estimation and prediction. Applied to financial data, it effectively identifies robust predictive models.
dc.identifier.uri	https://hdl.handle.net/10315/42963
dc.language	en
dc.rights	Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subject	Statistics
dc.subject.keywords	Bayesian method
dc.subject.keywords	Data integration
dc.subject.keywords	Gibbs sampling
dc.subject.keywords	Model selection
dc.subject.keywords	Sub-Gaussian
dc.subject.keywords	Sub-exponential
dc.subject.keywords	Union support recovery
dc.subject.keywords	Non-sparse
dc.subject.keywords	High-dimensional
dc.subject.keywords	Linear regression
dc.title	Bayesian Methods For Data Integration And High Dimensional Linear Model With Non-Sparsity
dc.type	Electronic Thesis or Dissertation

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zhang_Guan-Lin_2025_PhD.pdf
Size:: 737.88 KB
Format:: Adobe Portable Document Format

Download

Collections

ETD SWORD Deposit