Bandwidth Selection for Level Set Estimation in the Context of Regression and a Simulation Study for Non Parametric Level Set Estimation When the Density Is Log-Concave

dc.contributor.advisorJankowski, Hanna
dc.contributor.authorGonzalez Martinez, Gabriela
dc.date.accessioned2022-09-14T20:19:01Z
dc.date.available2022-09-14T20:19:01Z
dc.date.copyright2022-01-07
dc.date.issued2022-08-08
dc.date.updated2022-09-14T20:19:01Z
dc.degree.disciplineMathematics & Statistics
dc.degree.levelDoctoral
dc.degree.namePhD - Doctor of Philosophy
dc.description.abstractBandwidth selection is critical for kernel estimation because it controls the amount of smoothing for a function's estimator. Traditional methods for bandwidth selection involve optimizing a global loss function (e.g. least squares cross validation, asymptotic mean integrated squared error). Nevertheless, a global loss function becomes suboptimal for the level set estimation problem which is local in nature. For a function $g$, the level set is the set LSλ = {x : g(x) ≥ λ}. In the first part of this thesis we study optimal bandwidth selection for the Nadaraya-Watson kernel estimator in one dimension. We present a local loss function as an alternative to $L_2$ metric and derive an asymptotic approximation of its corresponding risk. The level set optimal bandwidth $(h_{opt})$ is the argument that minimizes the asymptotic approximation. We show that the rate of $h_{opt}$ coincides with the rate from traditional global bandwidth selectors. We then derive an algorithm to obtain the practical bandwidth and study its performance through simulations. Our simulation results show that in general, for small samples and small levels, the level set optimal bandwidth shows improvement in estimating the level set when compared to the cross validation bandwidth selection or the local polynomial kernel estimator. We illustrate this new bandwidth selector on a decompression sickness study on the effects of duration and pressure on mortality during a dive. In the second part, motivated by our simulation findings and the relationship of the level set estimation to the highest density region (HDR) problem, we study via simulations the properties of a plug-in estimator where the density is estimated with a log-concave mixed model. We focus in particular on univariate densities and compare this method against a kernel plug-in estimator. The bandwidth for the kernel plug-in estimator is chosen optimally for the HDR problem. We observe through simulations that when the number of components in the model is correctly specified, the log-concave plug-in estimator performs better than the kernel estimator for lower levels and similarly for the rest of the levels considered. We conclude with an analysis on the daily maximum temperatures in Melbourne, Australia.
dc.identifier.urihttp://hdl.handle.net/10315/39721
dc.languageen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectStatistics
dc.subject.keywordsBandwidth selection
dc.subject.keywordsLevel sets
dc.subject.keywordsNon-parametric regression
dc.titleBandwidth Selection for Level Set Estimation in the Context of Regression and a Simulation Study for Non Parametric Level Set Estimation When the Density Is Log-Concave
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
GonzalezMartinez_Gabriela_2022_PhD.pdf
Size:
6.21 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.87 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.39 KB
Format:
Plain Text
Description: