Bandwidth Selection for Level Set Estimation in the Context of Regression and a Simulation Study for Non Parametric Level Set Estimation When the Density Is Log-Concave

Date

2022-08-08

Authors

Gonzalez Martinez, Gabriela

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Bandwidth selection is critical for kernel estimation because it controls the amount of smoothing for a function's estimator. Traditional methods for bandwidth selection involve optimizing a global loss function (e.g. least squares cross validation, asymptotic mean integrated squared error). Nevertheless, a global loss function becomes suboptimal for the level set estimation problem which is local in nature. For a function g, the level set is the set LSλ = {x : g(x) ≥ λ}. In the first part of this thesis we study optimal bandwidth selection for the Nadaraya-Watson kernel estimator in one dimension. We present a local loss function as an alternative to L2 metric and derive an asymptotic approximation of its corresponding risk. The level set optimal bandwidth (hopt) is the argument that minimizes the asymptotic approximation. We show that the rate of hopt coincides with the rate from traditional global bandwidth selectors. We then derive an algorithm to obtain the practical bandwidth and study its performance through simulations. Our simulation results show that in general, for small samples and small levels, the level set optimal bandwidth shows improvement in estimating the level set when compared to the cross validation bandwidth selection or the local polynomial kernel estimator. We illustrate this new bandwidth selector on a decompression sickness study on the effects of duration and pressure on mortality during a dive. In the second part, motivated by our simulation findings and the relationship of the level set estimation to the highest density region (HDR) problem, we study via simulations the properties of a plug-in estimator where the density is estimated with a log-concave mixed model. We focus in particular on univariate densities and compare this method against a kernel plug-in estimator. The bandwidth for the kernel plug-in estimator is chosen optimally for the HDR problem. We observe through simulations that when the number of components in the model is correctly specified, the log-concave plug-in estimator performs better than the kernel estimator for lower levels and similarly for the rest of the levels considered. We conclude with an analysis on the daily maximum temperatures in Melbourne, Australia.

Description

Keywords

Statistics

Citation