Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reproducibly sample random inputs? #111

Closed
adrhill opened this issue Sep 11, 2024 · 4 comments
Closed

How to reproducibly sample random inputs? #111

adrhill opened this issue Sep 11, 2024 · 4 comments

Comments

@adrhill
Copy link
Contributor

adrhill commented Sep 11, 2024

I'm trying to benchmark and evaluate two methods on randomly sampled inputs.
However, the structure of the random inputs highly affects the performance of both methods. Is is possible to reproducibly sample the same inputs in two benchmark runs?

An example for such inputs would be random sparse matrices. Since these random matrices can be very ill-conditioned, I would like to evaluate both methods on the exact same sampled matrices.

using SparseArrays

T = Float64
n = 1000
p = 0.05 # probability of non-zero value in matrix

@b sprand(T, n, n, p) foo
@b sprand(T, n, n, p) bar

I could pass a RNG, but that I guess that would sample the same matrix over-and-over again?

@b sprand(MersenneTwister(123), T, n, n, p) foo
@b sprand(MersenneTwister(123), T, n, n, p) bar
@gdalle
Copy link

gdalle commented Sep 11, 2024

I just realized an easy workaround is to redefine the function we measure to include all the samples

vecfoo(v) = foo.(v)
@b [sprand(T, n, n, p) for _ in 1:10] vecfoo

@adrhill
Copy link
Contributor Author

adrhill commented Sep 11, 2024

So basically the following?

inputs = [sprand(T, n, n, p) for _ in 1:10]
@b foo.($inputs)
@b bar.($inputs)

@b usually returns the minimum runtime instead of the median/mean, so I think you might get vastly different timings.

@LilithHafner
Copy link
Owner

Yes. To benchmark the sum of the runtimes on a variety of reproducible random imputs you can use that construction. If you want detailed statistics based on the random choices (e.g. a histogram) you can benchmark each input separately:

inputs = [sprand(T, n, n, p) for _ in 1:10]
foos = [(@b input foo seconds=.01) for input in inputs]
bars = [(@b input bar seconds=.01) for input in inputs]
ratios = [f.time/b.time for (f,b) in zip(foos, bars)]

This could let you, for example, identify specific random inputs that foo is faster on and that bar is faster on.

@adrhill
Copy link
Contributor Author

adrhill commented Sep 11, 2024

Thanks, this has given me plenty of ideas! :)

@adrhill adrhill closed this as completed Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants