Bayesian Model Selection for Discrete Graphical Models

Loading...
Thumbnail Image

Date

2023-08-04

Authors

Roach, Lyndsay

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Graphical models allow for easy interpretation and representation of complex distributions. There is an expanding interest in model selection problems for high-dimensional graphical models, particularly when the number of variables increases with the sample size. A popular model selection tool is the Bayes factor, which compares the posterior probabilities of two competing models. Consider data given in the form of a contingency table where N objects are classified according to q random variables, where the conditional independence structure of these random variables are represented by a discrete graphical model G. We assume the cell counts follow a multinomial distribution with a hyper Dirichlet prior distribution imposed on the cell probability parameters. Then we can write the Bayes factor as a product of gamma functions indexed by the cliques and separators of G.

In this thesis, we study the behaviour of the Bayes factor when the dimension of a true discrete graphical model is fixed and when the dimension increases to infinity with the sample size. We prove that the Bayes factor is strong model selection consistent for both decomposable and non-decomposable discrete graphical models. When the true graph is non-decomposable, we prove that the Bayes factor selects a minimal triangulation of the true graph. We support our theoretical results with various simulations.

In addition, we introduce a variation of the genetic algorithm, called the graphical local genetic algorithm, which can be implemented on large data sets. We use a local search operator and a normalizing constant proportionate to the posterior probability of the candidate models to determine optimal submodels, then reconstruct the full graph from the resulting subgraphs. We demonstrate the graphical local genetic algorithm's capabilities on both simulated data sets with known true graphs and on a real-world data set.

Description

Keywords

Statistics

Citation