Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bayesian Bootstrap #20

Open
ParadaCarleton opened this issue Jul 30, 2021 · 0 comments
Open

Bayesian Bootstrap #20

ParadaCarleton opened this issue Jul 30, 2021 · 0 comments

Comments

@ParadaCarleton
Copy link
Member

ParadaCarleton commented Jul 30, 2021

The goal is to add the Bayesian bootstrap as a first-class alternative to leave-one-out cross validation in this package. I believe the Bayesian bootstrap should provide the major advantage of letting users plot a full posterior distribution, rather than just having a point estimate and standard error. Aside from the actual informational difference, I've noticed that plotting posteriors is a good way to help people intuitively understand that the point estimates aren't special. Show someone a point estimate and a standard error and they will usually either ignore the standard error or construct a 95% normal confidence interval.

Mentioning @topipa because Aki suggested I talk to you about this, and said you've built something similar before. I know that the bootstrap tends to underestimate the bias caused by overfitting, because bootstrap resamples will be more similar to the data than a new sample from the original distribution would be -- did your own implementation use any corrections for this bias?

My own thoughts on how to correct this:

  1. Iterated bootstrap techniques, and
  2. Adding random noise to resamples -- instead of assigning a random Dirichlet-distributed probability to every observation x, we can draw random observations x + N, where N ~ Normal(0, Σ / n), and then assign a random Dirichlet probability to each of these resamples. I've seen Tim Hesterberg suggest this in his textbook, but Aki seemed to suggest it would be a bad idea. Intuitively I'd expect this to reduce the bias caused by resampling, since we'd at least be getting the variance of the underlying distribution correct, but I could be wrong.

I have an initial implementation of a basic BB here, although it's not quite working yet -- the estimates seem to be slightly off, but I'm not sure why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant