Bayesian Methods for Data Integration and High Dimensional Linear Model with Non-Sparsity

Zhang, Guan-Lin

Bayesian Methods for Data Integration and High Dimensional Linear Model with Non-Sparsity

Files

Zhang_Guan-Lin_2025_PhD.pdf (737.88 KB)

Date

2025-07-23

Authors

Zhang, Guan-Lin

Abstract

We address data integration where correlated data are collected across multiple platforms, modeling responses and predictors linearly. We extend this framework by incorporating random errors from sub-Gaussian and sub-exponential distributions. The goal is to identify key predictors across platforms, even as the number of predictors and observations grows indefinitely.

Our approach combines marginal response densities from multiple platforms into a composite likelihood and introduces a Bayesian model selection criterion. Under regularity conditions, this criterion consistently selects the true model, even with a diverging model size. When true models differ across platforms, our method recovers the union support of predictors—those relevant in at least one platform. We implement a Monte Carlo Markov Chain (MCMC) algorithm for model selection.

Simulations show that integrating multiple platforms improves model selection accuracy. Applied to financial data, our method combines information from three indices, identifying key predictors and yielding a more accurate predictive model with lower mean squared error than single-source models.

In high-dimensional regression, sparsity assumptions on regression coefficients often fail when most coefficients are nonzero, causing bias. To address this, we propose Bayesian Grouping-Gibbs Sampling (BGGS), which partitions coefficients into 𝑘 groups, enabling efficient high-dimensional sampling.

We explore 𝑘-selection via simulations and recommend an "elbow plot" for optimal determination. Theoretical analysis ensures model selection consistency and bounded prediction error. Numerical experiments confirm BGGS’s advantage in estimation and prediction. Applied to financial data, it effectively identifies robust predictive models.

Keywords

Statistics

URI

https://hdl.handle.net/10315/42963

Collections

Mathematics & Statistics

Full item page

Bayesian Methods for Data Integration and High Dimensional Linear Model with Non-Sparsity

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections