-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Expose OpenCL optimization pass #1353
base: master
Are you sure you want to change the base?
Conversation
…fopencl everywhere. Still need to figure out how to handle data assignment. Wrote the code to parse data from log_prob and make new decls for the OpenCL data
Awesome stuff!! Let me know if I can help in any way. |
Ty! At this point it's mostly just brain storming nice patterns for all this. If you can look at stan-dev/stan#3219 that is a PR we are waiting on before we can merge this |
This is great! Looking forward to this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round - really more asking questions than anything proper review-y
|
||
let lub_mem_pat lst = | ||
let find_soa mem_pat = mem_pat = SoA in | ||
let find_soa mem_pat = match mem_pat with SoA -> true | _ -> false in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Equivalent to is_soa
above
type t = AoS | SoA | OpenCL [@@deriving sexp, compare, map, hash, fold, equal] | ||
|
||
let pp ppf = function | ||
| AoS -> Fmt.string ppf "AoS" | ||
| SoA -> Fmt.string ppf "SoA" | ||
| OpenCL -> Fmt.string ppf "OpenCL" | ||
|
||
let is_soa mem = match mem with SoA -> true | _ -> false | ||
let is_aos mem = match mem with AoS -> true | _ -> false | ||
let is_opencl mem = match mem with OpenCL -> true | _ -> false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a relationship between these right? All "OpenCL" memory layouts are automatically SoA, right?
Should is_soa
take this into account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All "OpenCL" memory layouts are automatically SoA, right?
Yes, but I think we should keep them different. For instance like printing the cpp we probably want to know the difference. I need to think about a right way to describe this
let is_eigen_type st = | ||
match st with | ||
| (SVector (mem, _) | SRowVector (mem, _) | SMatrix (mem, _, _)) | ||
when Mem_pattern.is_opencl mem -> | ||
false | ||
| SVector _ | SRowVector _ | SMatrix _ | SComplexRowVector _ | ||
|SComplexVector _ | SComplexMatrix _ -> | ||
true | ||
| _ -> false | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel a little nervous that SizedType.is_eigen_type st
and UnsizedType.is_eigen_type (SizedType.to_unsized st)
would return different things in some circumstances. I suppose it depends on how/where both of them are used, but I'd feel a bit more confident with a is_eigen_type
which matches Unsized and using is_eigen st && not (is_opencl st)
where needed.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree it's very shaky. I think splitting it out it a better idea
@@ -103,13 +126,16 @@ let query_stan_math_mem_pattern_support (name : string) | |||
|> Result.is_ok ) | |||
namematches in | |||
let is_soa ((_ : UnsizedType.returntype), _, mem) = | |||
mem = Mem_pattern.SoA in | |||
match requested_mem with | |||
| Mem_pattern.SoA -> mem = Mem_pattern.SoA || mem = OpenCL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like one place where the SoA and OpenCL things are considered equivalent, so I wonder if the is
functions in mem_pattern.ml
should do the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For checking the functions OpenCL
-> SoA
but it is not true that SoA
-> OpenCL
src/middle/Stan_math_signatures.ml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any functions with are SoA but not OpenCL? It seems like this was a total find-and-replace, but that also seems incorrect? (The math opencl/
folder is much smaller than the others, no?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for now I just did a find and replace but before we merge I need to go through and strip out what is SoA and what is OpenCL
@@ -220,6 +220,15 @@ let modify_sizedtype_mem (mem_pattern : Mem_pattern.t) st = | |||
match mem_pattern with | |||
| AoS -> demote_sizedtype_mem st | |||
| SoA -> promote_sizedtype_mem st | |||
| OpenCL -> promote_sizedtype_mem st |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenCL is "promoted" to SoA by this function?
| SRowVector (_, dim) -> SRowVector (mem_pattern, dim) | ||
| SMatrix (_, dim1, dim2) -> SMatrix (mem_pattern, dim1, dim2) | ||
| SArray (inner_type, dim) -> SArray (promote_mem mem_pattern inner_type, dim) | ||
| _ -> st |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to consider things inside tuples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to allow OpenCL to use tuples then yes.
Also the Decls
having a mem_pattern tag won't work because of tuples either :( I think we do just need to tag all sized types as having a memory pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that’s too bad. You could do something like what we had to do for Autodiff level and have a tuple specific variant in the type, but I wasn’t super happy with that either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah eod I think it's going to look a little odd in some places but I think it's fine to just add it to the sized types
@@ -220,6 +220,15 @@ let modify_sizedtype_mem (mem_pattern : Mem_pattern.t) st = | |||
match mem_pattern with | |||
| AoS -> demote_sizedtype_mem st | |||
| SoA -> promote_sizedtype_mem st | |||
| OpenCL -> promote_sizedtype_mem st | |||
|
|||
let rec promote_mem (mem_pattern : Mem_pattern.t) st = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it should be called replace_mem
rather than promote
@@ -43,6 +43,13 @@ let apply ~default ~merge op (ind : 'a t) = | |||
| Between (expr_top, expr_bottom) -> merge (op expr_top) (op expr_bottom) | |||
| MultiIndex exprs -> op exprs | |||
|
|||
let map_expr ~f = function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already derive map
for the type t
, so this is equivalent to Index.map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this needs deleted
@@ -1281,7 +1333,8 @@ let settings_const b = | |||
; lazy_code_motion= b | |||
; optimize_ad_levels= b | |||
; preserve_stability= not b | |||
; optimize_soa= b } | |||
; optimize_soa= b | |||
; optimize_opencl= false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd want this to be enabled as part of all_optimizations
so this should be b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk, personally this is a very specific optimization for a piece of hardware. I'd rather the user explicitly flips it on
Codecov Report
@@ Coverage Diff @@
## master #1353 +/- ##
==========================================
- Coverage 89.39% 88.95% -0.44%
==========================================
Files 65 65
Lines 10607 10814 +207
==========================================
+ Hits 9482 9620 +138
- Misses 1125 1194 +69
|
… or none for opencl in optimization pass
Submission Checklist
Summary
Creates a user exposed flag
-fopencl
that will attempt to promote log_prob for reverse mode such that we can run the entire thing on a GPU via OpenCL.This is about halfway there with a few things to figure out
The lower level C++ mir needs to be able to promote vectors and scalars so that they get moved over to the GPU. I think this means that we need to add a tag to arrays and scalars in the mir for a
Mem_pattern.t
. @WardBrian can you think of another way to do this that wouldn't require that? Would just be annoying since we would have to touch a lot of code.The optimization pass is all or none aka we either are able to move the entire
log_prob
over to the GPU or we go back to running only on the CPU. If we are doing this scheme then I think I also need to add a logger so that when a parameter fails we can give the user a reason as to why their model failed to be moved over to the GPU. (UPDATE: done but it's just a writer to stderr)The current scheme right now is
a. Do the exact same pass as the SoA optimization using the monotone framework
b. At the end check the compiler checks whether all the parameters declared in the parameters, transformed parameters, and model block are able to go on the GPU. If so then we are good and continue, otherwise we stop here and throw an error
c. If (b) passes then we do another pass over those blocks collecting the names of all of the data used in the model
d. Take the data names from (c) and in the data section add declarations and assignments of that data over to the GPU using
to_matrix_cl
like in the code below.I ripped out all the previous OpenCL code and my goal right now is just to get all of those tests compiling and working correctly.
I think it would be a good idea for now to leave the
target
on the CPU as when we write the scalar from the GPU to CPU that's a stopping point for the async opencl code to know it needs to finish before passing that scalar back.Mem_pattern.t
now has anOpenCL
type that indicates a statement or expression can be used on the GPU. For functions, every function in the math library that supports OpenCL also supports the new matrix type, but all functions that support the new matrix type do not support OpenCL. So for the table of available function signatures, if something is taggedOpenCL
we assume it can support bothSoA
andOpenCL
. For now I just tagged everything but before we merge and start testing I need to go through the math library and see which functions actually support OpenCLRelease notes
Allow
-fopencl
that performs a pass on log prob to attempt to promote the model to run on the GPU via OpenCLCopyright and Licensing
By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)