Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closures #2384

Closed
wants to merge 35 commits into from
Closed

Closures #2384

wants to merge 35 commits into from

Conversation

nhuurre
Copy link
Collaborator

@nhuurre nhuurre commented Feb 22, 2021

Summary

Issue #2197

Extends the following functions to support closure-like objects:

count_vars(...)
deep_copy_vars(...)
accumulate_adjoints(...)
save_varis(...)
zero_adjoints(...)
value_of(...)

That should be enough for any variadic higher-order function to support closures almost automatically.

The stanc3 PR exposing this in the language is stan-dev/stanc3#742

Tests

I added a couple of tests for integrate_ode_rk45, ode_rk45, and reduce_sum.

Side Effects

I don't think so.

Release notes

Implement closures.

Checklist

  • Math issue Implement closures #2197

  • Copyright holder: Niko Huurre

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

This was referenced Feb 22, 2021
@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.53 3.42 1.03 3.09% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -1.77% slower
eight_schools/eight_schools.stan 0.11 0.11 1.04 3.54% faster
gp_regr/gp_regr.stan 0.16 0.15 1.03 2.89% faster
irt_2pl/irt_2pl.stan 5.21 5.22 1.0 -0.07% slower
performance.compilation 91.18 88.59 1.03 2.85% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.86 7.81 1.01 0.64% faster
pkpd/one_comp_mm_elim_abs.stan 29.64 28.95 1.02 2.31% faster
sir/sir.stan 133.7 127.49 1.05 4.65% faster
gp_regr/gen_gp_data.stan 0.04 0.04 0.99 -1.09% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.04 3.05 1.0 -0.35% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.41 0.41 1.01 1.34% faster
arK/arK.stan 2.49 2.51 0.99 -0.72% slower
arma/arma.stan 0.72 0.72 1.0 -0.03% slower
garch/garch.stan 0.65 0.67 0.98 -1.92% slower
Mean result: 1.01076065729

Jenkins Console Log
Blue Ocean
Commit hash: ee21600


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@nhuurre
Copy link
Collaborator Author

nhuurre commented Feb 23, 2021

A quick example:

vector[N] v = rep_vector(1.0, N);
real myfunc(row_vector s) {
  return s * v;
}

The C++ type of myfunc is approximately

template<bool Ref>
class myfunc_type {
  capture_type_t<VectorXd, Ref> v;
  public:
  auto operator()(const auto& s) {
    return s * v;
  }
  // ...
  // + methods for accessing autodiff stack of the private member v
}

When the closure is first created the template type capture_type_t is a constant reference.

class myfunc_type {
  const VectorXd& v; // capture_type_t<VectorXd, true>

That's efficient but means the closure object cannot outlive the scope in which v was defined. If you need to use the closure outside of its initial scope (e.g. adjoint ode autodiff needs it during the reverse pass) then you must call myfunc.copy_of__() or myfunc.deep_copy_vars__() to make closure that holds copies of its captures.

class myfunc_type {
  VectorXd v; // capture_type_t<VectorXd, false>

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.43 3.48 0.98 -1.72% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.99 -0.6% slower
eight_schools/eight_schools.stan 0.11 0.11 0.97 -3.16% slower
gp_regr/gp_regr.stan 0.15 0.16 0.97 -2.63% slower
irt_2pl/irt_2pl.stan 5.22 5.21 1.0 0.17% faster
performance.compilation 89.92 88.91 1.01 1.13% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.03 8.0 1.0 0.34% faster
pkpd/one_comp_mm_elim_abs.stan 29.16 30.59 0.95 -4.91% slower
sir/sir.stan 132.58 139.81 0.95 -5.45% slower
gp_regr/gen_gp_data.stan 0.04 0.04 0.98 -2.55% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.03 3.03 1.0 0.18% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.39 0.97 -2.95% slower
arK/arK.stan 2.5 2.48 1.01 0.55% faster
arma/arma.stan 0.69 0.69 1.0 -0.19% slower
garch/garch.stan 0.65 0.65 0.99 -1.13% slower
Mean result: 0.985301669411

Jenkins Console Log
Blue Ocean
Commit hash: a595b43


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few questions/comments

inline void elementwise_check(const F& is_good, const char* function,
const char* name, const T& x, const char* must_be,
const Indexings&... indexings) {
// XXX skip closures
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We'll need to implement these.

Copy link
Member

@bbbales2 bbbales2 Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking more, I don't think there's a big advantage to implementing this.

We could expose the variables captured by a closure to checks, but the Math checks wouldn't know in what order its getting them, and then depending on which function was accepting closures it would need to decide which checks to do on which inputs.

I think instead in the ODE solves we check only the arguments passed in explicitly (which this is effectively doing) or we get rid of the infinity checks on the inputs to the ODE solves. I'll make an issue and see if getting rid of the checks altogether is an option. (Edit: Issue #2406)

template <typename F, require_stan_closure_t<F>* = nullptr,
require_not_st_arithmetic<F>* = nullptr>
inline auto value_of(const F& f) {
return f.value_of__();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow I vaguely remember this. Is this used anywhere yet?

}

template <bool Propto, typename F, bool Ref>
struct lpdf_wrapper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is lpdf_wrapper used for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you know how one can do

real func_lpdf(real[] slice, ...) {
  ...
}
target += reduce_sum(func_lpdf, ...);
target += reduce_sum(func_lupdf, ...);

and both reduce_sum calls take the same closure object as the first argument so you should be asking yourself, how does the closure remember if it's lpdf or lupdf?
The answer is that it's wrapped in lpdf_wrapper right before calling reduce_sum. Propto=true means lupdf and Propto=false means lpdf.

template <typename F>
auto lp_from_lambda(const F& f) {
return empty_closure_lp<F>(f);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for each kind of function we might have in Stan, we can also have a closure version of that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each function kind follows a different calling convention so each kind needs its own adapter closure. These aren't used in math library but stanc3 allows userdefined higher order functions that might need them.

* @ingroup type_trait
*/
template <typename F, typename... Args>
using fn_return_type_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this isn't part of return_type_t? One less template function to remember would be nice, but if there's a reason to keep it separate that's good too.

using type
= scalar_lub_t<scalar_type_t<T>, typename return_type<Ts...>::type>;
using type = scalar_lub_t<scalar_type_t<T>,
typename return_type<scalar_type_t<Ts>...>::type>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where'd the extra scalar_type_t come from here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good question.
If I understand it correctly return_type_t already applies scalar_type_t to each argument due to the recursive definition and scalar_type_t is idempotent so the extra one does nothing. You can revert this and compile reduce_sum_closure_test to see what happens. (Spoiler alert: I do not understand it correctly.)

*
* @tparam F A closure type
* @param f A closure of vars
* @return A new std::vector of vars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's returning a new closure

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad copy-paste.

res[1][1].grad();
EXPECT_FLOAT_EQ(t0v.adj(), -0.38494826636037426937);
stan::math::set_zero_all_adjoints();
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to check the adjoint of a in here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these tests are quite incomplete. The function doesn't even do anything with a so it only tests that autodiffable closures compile. How do people come up with good test cases?

Copy link
Member

@bbbales2 bbbales2 Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it will be a depressing amount of copy-paste. I think we can do something like:

y' = -recursive_sum(captured_args...) * y;

where recursive_sum just keeps summing everything until out pops a scalar. And then we can test that this basic example works for capturing at least a scalar, a matrix, and array, and then a couple combinations of the above.

And similarly we'll add tests for reduce_sum and whatnot. Let's only do the boring tests once everything else is in place though (integrate + algebra_solver tests). I will help with those. Only need enough tests now to convince ourselves things can work (so just a couple are fine).

@bbbales2
Copy link
Member

bbbales2 commented Mar 3, 2021

@nhuurre if you get a chance can you review #2397? When that is in I'll do a version for algebra solver, and then when we have that then we can support all the high order stuff except map_rect.

Then I guess it's a matter of adding tests here and the Math part of this is done.

@bob-carpenter
Copy link
Contributor

where recursive_sum just keeps summing everything until out pops a scalar.

I think we already have templates that do this for existing types in the math library. The general recursive template pattern is also used for the vectorized form of scalar real functions.


template <typename... Args>
auto operator()(std::ostream* msgs, Args... args) const {
return f_(s_, args..., msgs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having functions where the variadic arguments aren't last makes writing the C++ painful (cause no type deduction). Could it go captured_args..., msgs, args...?

@nhuurre
Copy link
Collaborator Author

nhuurre commented Mar 6, 2021

@bbbales2

if you get a chance can you review #2397? When that is in I'll do a version for algebra solver, and then when we have that then we can support all the high order stuff except map_rect.

The PR looks good to me but I'm not very familiar with integrate_1d code.

If you're looking into the algebra solver then I'm pinging @charlesm93 because the last post in the ancient Discourse thread is him saying he'd do it. Not sure if he's still up to it but anyway.

And I think map_rect is feasible. I have a partial implementation of serialization, start here. Does the second test in that file correctly imitate MPI data flow?

@bbbales2
Copy link
Member

bbbales2 commented Mar 6, 2021

Does the second test in that file correctly imitate MPI data flow?

Yeah, this looks right. I'd have to dig in more to the MPI to know. As the caching is implemented now can lead to bugs.

The current map_rect takes a function:

vector f(vector phi, vector theta, data real[] x_r, data int[] x_i);

phi are parameters shared between every node.

So if we took a closure, g(vector phi, vector theta, data real[] x_r, data int[] x_i) => vector and the compiler built a non-closure function h that wrapped g with the appropriate deserialization then I think this would work.

So h would look something like:

vector h(vector phi, vector theta, data real[] x_r, data int[] x_i) {
  g_closure g(closure_vars(phi), closure_x_r(x_r), closure_x_i(x_i));
  return g(non_closure_part_of_phi(phi), theta, non_closure_part_of_x_r(x_r), non_closure_part_of_x_i(x_i));
}

And so what you have looks right to me. I'd have to think more closely to know for sure though.

@bbbales2
Copy link
Member

bbbales2 commented Mar 6, 2021

I think the routine would be define another map_rect that takes in a first argument that is a closure. This map_rect would pack up all the data and then call the regular map_rect (regular map_rect is here).

Regular map_rect takes as a template argument a functor type F, which is the thing that will get called. I'm not sure if this is a thing that would need constructed by the compiler, or if we can build the F from the type of the functor that got passed in.

Regardless, we would need to build a special F that unpacked all the data and then called our closure function. I think if we can do those two things we're good.

@bbbales2
Copy link
Member

bbbales2 commented Mar 6, 2021

Also there are some weird macros floating around that I'm not sure about: https://github.com/stan-dev/math/blob/develop/stan/math/prim/functor/map_rect_mpi.hpp#L48

@SteveBronder
Copy link
Collaborator

Is this still being worked on? Happy to take on the review if so!

@nhuurre
Copy link
Collaborator Author

nhuurre commented Apr 15, 2021

Is this still being worked on?

I haven't resolved the merge conflicts here because I figured everyone is too busy with the adjoint ODE stuff anyway. I do have a branch that is up to date with it feature/adjoint-ode...nhuurre:closures-adjoint-ode (huge diff because the adjoint ODE PR hasn't merged in develop for a while, I think)
You can download https://github.com/nhuurre/cmdstan/releases/download/closures-v0.2/cmdstan-closures-adjoint-ode.tar.gz

IIRC I found some benchmark model in the adjoint ode discussion and tried running it with no changes to Stan code, so basically an empty closure, it was like 10% slower which was pretty depressing because I have no idea where to start debugging that.

@wds15
Copy link
Contributor

wds15 commented Apr 15, 2021

Don‘t wait for adjoint ODE...I am not sure how busy this is keeping others than myself. However, I will merge develop into the adjoint branch rather soon, since now the ODE testing branch is in develop.

@wds15
Copy link
Contributor

wds15 commented Apr 15, 2021

FYI: the adjoint ODE branch is now up to date with develop

@nhuurre
Copy link
Collaborator Author

nhuurre commented Jun 20, 2021

The new closure ODE stuff is 10% faster for this example.

Your code seems to test a closure on cmdstan-closures against a bare function on cmdstan-closures instead of a function on cmdstan-develop. For simplicity I removed the ode_functor structs in favor of wrapper closures so it's possible that what you're seeing is a 10% slowdown for everything that isn't a closure.

@wds15
Copy link
Contributor

wds15 commented Jun 22, 2021

Ok, I see. So I get:

  • vanilla 2.27: 52.5s
  • closure branch, no closures: 58s
  • closure branch, closures: 53.8s

so it looks like the closure stuff is taking off a little bit.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.01 3.04 0.99 -0.88% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.93 -7.97% slower
eight_schools/eight_schools.stan 0.11 0.1 1.09 8.63% faster
gp_regr/gp_regr.stan 0.16 0.16 1.03 2.8% faster
irt_2pl/irt_2pl.stan 5.89 5.86 1.01 0.51% faster
performance.compilation 87.48 87.4 1.0 0.1% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.67 8.54 1.01 1.43% faster
pkpd/one_comp_mm_elim_abs.stan 29.39 30.33 0.97 -3.22% slower
sir/sir.stan 127.61 127.05 1.0 0.44% faster
gp_regr/gen_gp_data.stan 0.03 0.04 0.98 -2.49% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.03 2.97 1.02 1.83% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.39 0.97 -2.63% slower
arK/arK.stan 1.88 1.86 1.01 1.27% faster
arma/arma.stan 0.93 0.83 1.11 10.04% faster
garch/garch.stan 0.63 0.53 1.2 16.46% faster
Mean result: 1.02165497835

Jenkins Console Log
Blue Ocean
Commit hash: 75f6d30


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! Got a few things, like names etc but overall looks good! I made some changes in the branch below that handle references and things a bit nicer and removes fn_return_type if you want to merge it in. I only ran

./runTests.py -j28 ./test/unit/math/ -f closure

though those seem to pass
nhuurre/math@feature/closures-v2...stan-dev:review1/closures

/**
* A closure that wraps a C++ lambda and captures values.
*/
template <bool Ref, typename F, typename... Ts>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all needs docs for template parameters etc.

Comment on lines 31 to 33
return apply([this, msgs, &args...](
const auto&... s) { return f_(s..., args..., msgs); },
captures_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] When capturing this I prefer to use this-> before the member functions so they are a bit easier to see

Comment on lines 19 to 22
using captured_scalar_t__ = return_type_t<Ts...>;
using ValueOf__
= base_closure<false, F, decltype(eval(value_of(std::declval<Ts>())))...>;
using CopyOf__ = base_closure<false, F, Ts...>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have docs for what these are as their definitions also change across the different types of closures


template <typename... Args>
auto operator()(std::ostream* msgs, const Args&... args) const {
return apply([this, msgs, &args...](
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] I like having things captured by reference at the front and then things copied after

Comment on lines +73 to +74
template <bool propto>
auto with_propto() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use camelcase for template parameters. How is this propto different from Propto in the template parameters of the class?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for like lupdf or something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's an lupdf hack. This normalization nonsense is rather confusing so here's an example:

parameters {
  real y[100];
}
model {
  function
  real higher_lpdf(real[] x, real(real[], int, int) f_lpdf) {
    real lp = 0;
    lp += reduce_sum(f_lpdf, x, 1);  // <-- A
    lp += reduce_sum(f_lupdf, x, 1); // <-- B
    return lp;
  }
  function
  real partial_lpdf(real[] x, int s, int e) {
    return std_normal_lupdf(x|);
  }
  target += higher_lpdf( y| partial_lpdf);  // <-- 1
  target += higher_lupdf(y| partial_lupdf); // <-- 2
}

Using lpdf instead of lupdf anywhere makes the return value normalized.
In the example reduce_sum is called four times, at (1A), (1B), (2A), (2B), and only (2B) is unnormalized.

The above compiles to C++ that looks something like

auto higher_lpdf = from_lambda([&](auto f_lpdf) {
  var lp = 0;
  lp += reduce_sum(f_lpdf.with_propto<false>(), x, 1); // <-- A
  lp += reduce_sum(f_lpdf.with_propto<true>(), x, 1);  // <-- B
  return lp;
});
auto partial_lpdf = from_lambda([]<bool propto>(auto x, int s, int e) {
  return std_normal_lpdf<propto>(x);
});
lp_accum__.add(higher_lpdf(y, partial_lpdf.with_propto<false>()); // <-- 1
lp_accum__.add(higher_lpdf(y, partial_lpdf.with_propto<true>());  // <-- 2

Every time the closure object is passed to a higher-order function Propto records if it's in its lpdf or lupdf form.

= base_closure<false, F, decltype(eval(value_of(std::declval<Ts>())))...>;
using CopyOf__ = base_closure<false, F, Ts...>;
F f_;
std::tuple<capture_type_t<Ts, Ref>...> captures_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the higher level logic for ref? Aka why can't these always just be references?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value_of() and deep_copy_vars() cannot return references.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value_of() can return references. But yeah deep_copy_vars() cannot. How about something like this?

nhuurre/math@feature/closures-v2...stan-dev:review2/closure

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also could you add some tests and docs for all these? It would help me understand what your design goal is and how things should work

struct closure_lp {
using captured_scalar_t__ = return_type_t<Ts...>;
using ValueOf__ = closure_lp<Propto, true, F, Ts...>;
using CopyOf__ = closure_lp<Propto, true, F, Ts...>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ standard reserves double underscore for the compiler implementations (yes we do this at the stanc3 level but it's not good and we should not do it here)

Comment on lines +250 to +260
template <typename F, require_stan_closure_t<F>* = nullptr,
require_arithmetic_t<return_type_t<F>>* = nullptr>
inline double integrate_1d(const F& f, double a, double b,
const std::vector<double>& theta,
const std::vector<double>& x_r,
const std::vector<int>& x_i, std::ostream* msgs,
const double relative_tolerance
= std::sqrt(EPSILON)) {
return integrate_1d_impl(integrate_1d_closure_adapter(), a, b,
relative_tolerance, msgs, f, theta, x_r, x_i);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Think I'm just missing some context, why is f being passed as an argument here? instead of being pushed to integrate_1d_adapter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

integrate_1d_impl already knows how to handle the autodiff variables passed as arguments but would need new logic for extracting them from the functor.


std::vector<std::vector<return_type_t<T_y0, T_param, T_t0, T_ts>>>
std::vector<std::vector<fn_return_type_t<F, T_y0, T_param, T_t0, T_ts>>>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just have the logic in return_type_t for handling closures?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I did have some logic for handling closures in return_type_t but for backwards compatibility some places need logic for also handling arbitrary C++ lambdas. I created fn_return_type_t because I wasn't sure if adding a catchall case for "anything that isn't a known type" would have some undesirable side effects. Your branch seems to have resolved that issue.

Comment on lines 68 to 72
template <typename T>
struct capture_type<T, false,
require_stan_closure_t<std::remove_reference_t<T>>> {
using type = typename std::remove_reference_t<T>::CopyOf__;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this stan_lambda_capture_type or something of the sort

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 2.99 2.99 1.0 0.23% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.02 1.96% faster
eight_schools/eight_schools.stan 0.1 0.1 1.0 -0.25% slower
gp_regr/gp_regr.stan 0.16 0.16 1.0 0.13% faster
irt_2pl/irt_2pl.stan 5.9 5.83 1.01 1.2% faster
performance.compilation 88.29 87.48 1.01 0.91% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.69 8.62 1.01 0.86% faster
pkpd/one_comp_mm_elim_abs.stan 31.87 30.05 1.06 5.71% faster
sir/sir.stan 127.0 136.13 0.93 -7.19% slower
gp_regr/gen_gp_data.stan 0.04 0.04 0.98 -1.74% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.99 2.96 1.01 1.0% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.41 0.99 -1.21% slower
arK/arK.stan 1.86 2.54 0.73 -36.61% slower
arma/arma.stan 0.82 0.91 0.9 -10.98% slower
garch/garch.stan 0.52 0.6 0.87 -14.74% slower
Mean result: 0.968685941601

Jenkins Console Log
Blue Ocean
Commit hash: 2b2bee2


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.48 3.49 1.0 -0.16% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.93 -7.56% slower
eight_schools/eight_schools.stan 0.09 0.09 1.01 1.29% faster
gp_regr/gp_regr.stan 0.14 0.14 1.01 1.27% faster
irt_2pl/irt_2pl.stan 5.04 5.12 0.98 -1.56% slower
performance.compilation 89.51 88.78 1.01 0.81% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.22 8.46 0.97 -2.96% slower
pkpd/one_comp_mm_elim_abs.stan 30.87 31.15 0.99 -0.92% slower
sir/sir.stan 120.23 128.63 0.93 -6.99% slower
gp_regr/gen_gp_data.stan 0.03 0.03 1.0 0.41% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.03 2.99 1.01 1.09% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.37 0.37 1.01 0.57% faster
arK/arK.stan 2.02 2.02 1.0 0.13% faster
arma/arma.stan 0.26 0.25 1.03 2.69% faster
garch/garch.stan 0.6 0.6 1.0 -0.3% slower
Mean result: 0.992690362104

Jenkins Console Log
Blue Ocean
Commit hash: 03ab504


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder
Copy link
Collaborator

@nhuurre sorry for not coming back to this, is this ready to be looked at again?

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.41 3.46 0.99 -1.41% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 -0.39% slower
eight_schools/eight_schools.stan 0.09 0.09 1.0 -0.46% slower
gp_regr/gp_regr.stan 0.14 0.14 0.99 -0.98% slower
irt_2pl/irt_2pl.stan 5.16 5.12 1.01 0.76% faster
performance.compilation 91.65 89.41 1.02 2.44% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.13 8.14 1.0 -0.15% slower
pkpd/one_comp_mm_elim_abs.stan 29.7 29.32 1.01 1.27% faster
sir/sir.stan 121.57 125.58 0.97 -3.29% slower
gp_regr/gen_gp_data.stan 0.03 0.03 1.0 -0.4% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.97 0.99 -0.66% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.39 0.41 0.96 -4.25% slower
arK/arK.stan 2.05 2.08 0.99 -1.15% slower
arma/arma.stan 0.27 0.27 1.0 -0.37% slower
garch/garch.stan 0.65 0.65 1.01 0.78% faster
Mean result: 0.994772302881

Jenkins Console Log
Blue Ocean
Commit hash: 9fc569f


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@syclik
Copy link
Member

syclik commented Oct 21, 2021

@nhuurre and @SteveBronder, I'm marking it as a draft; there hasn't been a response in 28 days.

@syclik syclik marked this pull request as draft October 21, 2021 14:18
@syclik
Copy link
Member

syclik commented Apr 29, 2022

Closing this PR for now. @nhuurre, if this is still active, please reopen.

@syclik syclik closed this Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants