-
-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedded Laplace approximation #755
Comments
Is the only use case you anticipate through the Stan language? Is the idea to have approximate distributions, like poisson_approx or something? This issue eventually needs to specify what the signatures are of any functions that are being proposed for the math library. @charlesm93---I assigned to you for v3.0.0 (that's where I'm putting things we definitely want to do but not necessarily for the next release). It helps to add labels to classify the issue. |
This is part of the path towards INLA speed for Gaussian latent variable models in Stan. Anticipated first use case is compound functions for GLMs with Laplace integration over weights. Issue #493 describes compound function
where speed-up is obtained with analytic gradients. With Laplace approximation it would be possible to add something like
which would assume Gaussian prior on beta with sd sigma, and would use Laplace approximation to integrate over beta. Instead of logistic, the first ones to implement would be Normal, Poisson, Binomial and Negative-Binomial. Eventually there will be different structured Gaussian priors like hierarchical, Markov random field and Gaussian process. Ping @dpsimpson |
@avehtari this sounds great...shouldn't we have a different sigma for the intercept and all other regressors? I would appreciate that as I don't like that much data-driven priors which you essentially end-up if I can only give a single sigma and hence have to transform my data. Maybe too early for these comments, but relevant from my perspective. |
Sure. I didn't say that sigma would be a scalar. We'll make a more detail proposal soon. |
copy pasta for email One other little thing if you want to squeeze out the last little bits of speed. Note that stan math functions always return back an actual matrix, where one of the main benefits of eigen is to compose multiple expressions together so during compile time it writes them out very efficiently. If you are working with matrices of doubles you should use eigen methods for faster code. Here's a trick you can do with things like third_diff() etc template <typename T>
Eigen::Matrix<T, Eigen::Dynamic, 1>
third_diff(const Eigen::Matrix<T, Eigen::Dynamic, 1>& theta) const {
Eigen::VectorXd exp_theta = exp(theta);
Eigen::VectorXd one = Eigen::VectorXd::Ones(theta.size());
// Note the auto type!!
auto nominator = exp_theta.cwiseProduct(exp_theta - one);
auto denominator = (one + exp_theta).array().sqrt().matrix().cwiseProduct(one + exp_theta);
return n_samples_.cwiseProduct((nominator.array() / denominator.array()).matrix());
} So the main bit here is that numerator and denominator being "auto" means they are a wacky expression like |
Also writing the laplace vari using the |
This is so cool that this is possible now!
But this is the wrong Laplace branch. The good one is here
https://github.com/stan-dev/math/tree/try-laplace_approximation2/stan/math/laplace
…On Sun, Sep 27, 2020 at 15:01 Steve Bronder ***@***.***> wrote:
copy pasta for email
One other little thing if you want to squeeze out the last little bits of
speed. Note that stan math functions always return back an actual matrix,
where one of the main benefits of eigen is to compose multiple expressions
together so during compile time it writes them out very efficiently. If you
are working with matrices of doubles you should use eigen methods for
faster code. Here's a trick you can do with things like third_diff() etc
template
Eigen::Matrix<T, Eigen::Dynamic, 1>
third_diff(const Eigen::Matrix<T, Eigen::Dynamic, 1>& theta) const {
Eigen::VectorXd exp_theta = exp(theta);
Eigen::VectorXd one = Eigen::VectorXd::Ones(theta.size());
// Note the auto type!!
auto nominator = exp_theta.cwiseProduct(exp_theta - one);
auto denominator = (one +
exp_theta).array().sqrt().matrix().cwiseProduct(one + exp_theta);
return n_samples_.cwiseProduct((nominator.array() /
denominator.array()).matrix());
}
So the main bit here is that numerator and denominator being "auto" means
they are a wacky expression like ColWiseProd<CWiseSubtract<Matrix,
Matrix>,Matrix> that is not actually evaluated at that line. So when you
pass thos into the return line the expression for the result is generated
fully. (you could also just slap the code for numerator and denominator
into the return line but then it starts becoming a bit of a mouthful). But
you should only do this in places where you use the expression once,
otherwise it's better to eval it into a temporary like you are currently
doing with one. Here's an example that shows on the rhs the weird
expression type nominator, denominator, and the return type etc gets
https://godbolt.org/z/7bE5n1
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#755 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADRICBRJKRPYPBEJMYHABEDSH6DYXANCNFSM4EPTKGSA>
.
|
Whups, good catch! I just searched for laplace issue and this was the nearest looking one |
Good catch @dpsimpson. The reason we have two branches is because the first branch was written when we submitted the manuscript on the adjoint-differentiated Laplace approximation (with code discussed in the paper). The second branch came after discussion on how to improve the code before StanCon, but we didn't want to break "backward compatibility" with the paper. We get to update the manuscript before its publication, so I'll link to the new branch and delete the old one. |
I deleted a bunch of files which I used for my research but wouldn't make it into production code. The commit and the commit prior to those changes, should we need to retrieve those files, are
dated December 5th. One file we might want to pull back out at some point is the draft for the negative binomial likelihood. Currently, every unit test under Since we might want to further compare autodiff and analytical derivatives, I think it's fine to keep this code in for now. |
For any future modifications, the unit tests that currently pass should be:
All these tests for consistency between (first-order reverse mode) autodiff and finite diff. For most tests, I have benchmarks from GPstuff but where indicated in the code a benchmark is missing. |
Todo list for myself:
EDIT: updated 01/16. |
Let me know if you need help!
…On Mon, Dec 6, 2021 at 11:47 Charles Margossian ***@***.***> wrote:
Todo list for myself:
[] write a primer design doc (per @SteveBronder
<https://github.com/SteveBronder> 's request)
[] complete unit tests for existing functions using benchmarks from GPstuff
[] work out unit tests for rng functions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#755 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADRICBS7JNU5BXPDFCJMBWLUPQBY3ANCNFSM4EPTKGSA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
What would be the best way to test the rng? I can try to come up with limiting cases for each example, but I'm wondering if there is a general approach. How was this done for INLA and GPstuff? |
It's hard to work out how to test this as it currently is - the function does 3 things: competes Laplace approximation (tested elsewhere), computes the mean and covariance matrix (not tested), and does the multi normal rng (function tested, this specific call not). My temptation (which might not be very smart) would be to separate that computation of the mean and the covariance into an inline function and test it separately. Then you can test the rest of it in the same way that, say multi_normal_prec_rng is tested. |
Ripping code apart into testable things is not at all not smart from what I learned. |
The other option is to do a probabilistic test that runs the rng 1k+++ times and computes the mean and covariances, but that feels like it would be a very expensive test. (and it would occasionally fail) I can't think of another way to test a multivariate distribution without direct access to that mean and covariance matrix. |
Actually, there is one other possibility, that depends on how the |
There are at least three concerns that need to be balanced,
I think @wds15 just means that (1) is usually opposed to (2) and (3). |
One different approach I did for one of my R packages where I wanted to run SBC is to proxy the test results from the SBC run. So what I do is:
This way I can have a massive SBC simulation from which the key results are part of the test evidence and I ensure that the SBC outputs are not too old. That's not ideal, but a compromise. Maybe a more sophisticated version of this is useful here as well? |
I think that’s too heavy for such a simple function. Equality of a matrix
and a vector OR equality of a draw made with the same random seed is enough
to show correctness. (Actually this has to happen twice because there’s a
factored and non-factored version)
…On Tue, Dec 7, 2021 at 03:19 wds15 ***@***.***> wrote:
One different approach I did for one of my R packages where I wanted to
run SBC is to proxy the test results from the SBC run. So what I do is:
1. Run SBC on a cluster and store the final histograms needed to
caclculate the chi-square statistic. This is stored with a time-stamp
2. The unit test is then doing
a) check that the time-stamp of the stored SBC run is no older than
0.5y as compared to the runtime of the test
b) test for uniformity via a chi-square test; allowing some failures
according to what I would expect
This way I can have a massive SBC simulation from which the key results
are part of the test evidence and I ensure that the SBC outputs are not too
old. That's not ideal, but a compromise. Maybe a more sophisticated version
of this is useful here as well?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#755 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADRICBQXCNO2LSTRY3DDXTTUPTPBDANCNFSM4EPTKGSA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I ended up running with one of @dpsimpson's suggestions and implementing it in I then ran a bunch of simulations (1e6) and checked estimators of the mean, variance and covariance. On a two group problem, this test takes 8 seconds to run. It's not ideal but it's not horrendous. I'm not sure how to control the rng state. Both
but I couldn't fix this state. I'm assuming it's possible, in which case, we can implement Dan's second proposition. It's a bit of work to get the analytical results, but only a bit and I think it's good to have unit tests which rely on analytical calculations. |
Thanks @SteveBronder for showing me how to control the random seed. Using this, I was able to produce the same sample using the embedded Laplace and
Is this enough? I think this checks the boxes, i.e. making sure the rng function generates a sample from the correct approximating normal distribution. |
Seems good! Is this all up on a branch somewhere? |
Yes. |
Just making a note we removed an experimental feature to flip between diagonal and non-diagonal covariance estimates in commit 6b97f1f. We also remove the |
I'm going to work on the following update:
Update: ... and job done. Next up: update the notebook and the examples therein (and done!) |
Up next, I want to tackle the sparse covariance case. Of interest is the scenario where the Hessian and the covariance are both sparse, as is the case in a model from our BLS colleagues. In this scenario the B-matrix (the one matrix on which we perform a Cholesky decomposition) also becomes sparse. If B is diagonal all operations become O(n), as opposed to O(n^3). More generally, if m is the largest block size between the Hessian and the covariance, the complexity of the method would be O(nm^2). Before implementing this at a production level, I want to build a proof of concept. My idea would be to add an optional argument:
|
Moving this to #3065 |
Summary:
Want to create a Laplace approximation for the following distributions:
Description:
Doing a Laplace approximation with the current algebraic solver (which uses a dogleg solver) is prohibitively slow. Instead we could use a Newton line-search method, coupled with some analytical derivatives. This line-search method would be specialized and therefore not exposed to the Stan language, but the approximator would be. Aki and Dan have a prototype in Stan, which uses autodiff. The C++ implementation would naturally be more optimized.
Current Version:
v2.17.0
The text was updated successfully, but these errors were encountered: