Bayesian Methods For Data Integration And High Dimensional Linear Model With Non-Sparsity

dc.contributor.advisorXin Gao
dc.contributor.authorZhang, Guan-Lin
dc.date.accessioned2025-07-23T15:10:19Z
dc.date.available2025-07-23T15:10:19Z
dc.date.copyright2025-03-07
dc.date.issued2025-07-23
dc.date.updated2025-07-23T15:10:18Z
dc.degree.disciplineMathematics & Statistics
dc.degree.levelDoctoral
dc.degree.namePhD - Doctor of Philosophy
dc.description.abstractWe address data integration where correlated data are collected across multiple platforms, modeling responses and predictors linearly. We extend this framework by incorporating random errors from sub-Gaussian and sub-exponential distributions. The goal is to identify key predictors across platforms, even as the number of predictors and observations grows indefinitely. Our approach combines marginal response densities from multiple platforms into a composite likelihood and introduces a Bayesian model selection criterion. Under regularity conditions, this criterion consistently selects the true model, even with a diverging model size. When true models differ across platforms, our method recovers the union support of predictors—those relevant in at least one platform. We implement a Monte Carlo Markov Chain (MCMC) algorithm for model selection. Simulations show that integrating multiple platforms improves model selection accuracy. Applied to financial data, our method combines information from three indices, identifying key predictors and yielding a more accurate predictive model with lower mean squared error than single-source models. In high-dimensional regression, sparsity assumptions on regression coefficients often fail when most coefficients are nonzero, causing bias. To address this, we propose Bayesian Grouping-Gibbs Sampling (BGGS), which partitions coefficients into 𝑘 groups, enabling efficient high-dimensional sampling. We explore 𝑘-selection via simulations and recommend an "elbow plot" for optimal determination. Theoretical analysis ensures model selection consistency and bounded prediction error. Numerical experiments confirm BGGS’s advantage in estimation and prediction. Applied to financial data, it effectively identifies robust predictive models.
dc.identifier.urihttps://hdl.handle.net/10315/42963
dc.languageen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectStatistics
dc.subject.keywordsBayesian method
dc.subject.keywordsData integration
dc.subject.keywordsGibbs sampling
dc.subject.keywordsModel selection
dc.subject.keywordsSub-Gaussian
dc.subject.keywordsSub-exponential
dc.subject.keywordsUnion support recovery
dc.subject.keywordsNon-sparse
dc.subject.keywordsHigh-dimensional
dc.subject.keywordsLinear regression
dc.titleBayesian Methods For Data Integration And High Dimensional Linear Model With Non-Sparsity
dc.typeElectronic Thesis or Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang_Guan-Lin_2025_PhD.pdf
Size:
737.88 KB
Format:
Adobe Portable Document Format