Bayesian Methods For Data Integration And High Dimensional Linear Model With Non-Sparsity

Loading...
Thumbnail Image

Authors

Zhang, Guan-Lin

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

We address data integration where correlated data are collected across multiple platforms, modeling responses and predictors linearly. We extend this framework by incorporating random errors from sub-Gaussian and sub-exponential distributions. The goal is to identify key predictors across platforms, even as the number of predictors and observations grows indefinitely.

Our approach combines marginal response densities from multiple platforms into a composite likelihood and introduces a Bayesian model selection criterion. Under regularity conditions, this criterion consistently selects the true model, even with a diverging model size. When true models differ across platforms, our method recovers the union support of predictors—those relevant in at least one platform. We implement a Monte Carlo Markov Chain (MCMC) algorithm for model selection.

Simulations show that integrating multiple platforms improves model selection accuracy. Applied to financial data, our method combines information from three indices, identifying key predictors and yielding a more accurate predictive model with lower mean squared error than single-source models.

In high-dimensional regression, sparsity assumptions on regression coefficients often fail when most coefficients are nonzero, causing bias. To address this, we propose Bayesian Grouping-Gibbs Sampling (BGGS), which partitions coefficients into 𝑘 groups, enabling efficient high-dimensional sampling.

We explore 𝑘-selection via simulations and recommend an "elbow plot" for optimal determination. Theoretical analysis ensures model selection consistency and bounded prediction error. Numerical experiments confirm BGGS’s advantage in estimation and prediction. Applied to financial data, it effectively identifies robust predictive models.

Description

Keywords

Statistics

Citation