Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add cms caching cuda allocator #79

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cgleggett
Copy link

Imported the CMS caching cuda allocator (based on CUB).

@cgleggett
Copy link
Author

@krasznaa @stephenswat : before doing anything more, are you happy with the structure?

@cgleggett cgleggett changed the title Add cms caching cuda allocator [WIP] Add cms caching cuda allocator Apr 22, 2021
@krasznaa
Copy link
Member

Hi Charles,

This is not exactly what I had in mind. The "simpler" option, as we discussed on Monday, would be that you:

  • Copy the "notcub code" more or less as-is under cuda/src/. So not making any of it public in vecmem::cuda's interface.
  • Create 2 or 3 memory resources, which you could call vecmem::cuda::notcub::host_memory_resource, vecmem::cuda::notcub::device_memory_resource and (potentially) vecmem::cuda::notcub::managed_memory_resource.

This would still not integrate all too excellently into our overall framework, but it would at least fit into it.

The best option would be to rewrite the CUB code to working with a generic underlying vecmem::memory_resource. So that the same "CUB logic" could be used on top of any type of low level resource. This is pretty much what was done in std::pmr::synchronized_pool_resource btw. I believe mostly by the NVidia developers actually...

But let's not get too far with this. Let's just try "the first option". Can you try creating "notcub specific" memory resources on top of this code, while making the "notcub" code private in the vecmem::cuda library?

Cheers,
Attila

@krasznaa krasznaa self-requested a review April 23, 2021 07:16
@stephenswat
Copy link
Member

I think Charles is explicitly asking if the directory structure and such are okay before he gets on with creating the actual memory resources. So it makes sense that those bits are not implemented yet. 🙂

The only thing I would say here is that indeed it would be better to not expose these Notcub internals to the outside world. The only thing an outside user should be able to see are the memory resources wrapping the Notcub code, and not the underlying allocator. That would mean taking some of the headers out of the include directory and moving them to the src/ directory.

Regarding the rewrite, as we discussed on Monday that would indeed be the nicest option, but it would also be costly in terms of implementation cost. From what I understood the plan here is to come up with a proof of concept for this Noncub allocator to see if it is viable to use in our context. If that turns out to be the case, we can work on factoring the allocator into parts. It's interesting to see that this PR brings in two ~700 line classes. Ideally, we would factor that into one class, and the product of that with the different low-level memory resources would bring us up to feature parity. But that can wait as far as I am concerned.

@cgleggett
Copy link
Author

@stephenswat is correct - I wanted to make sure you were happy with the file layout and namespaces before adding any memory resources.

I think it's a mistake to make it entirely private to vecmem - being able to access the main device/host allocator interface would be very useful for performance benchmarking of notcub against other implementations. I saw some substantial improvements using it for the Seedfinder, though interestingly using the host allocator was as important as using the device allocator.

@krasznaa
Copy link
Member

But Charles, VecMem does not work by providing allocators to its users. It works by providing memory resources. For an allocator you are always supposed to use vecmem::polymorphic_allocator. (Aka std::pmr::polymorphic_allocator.)

I want to be able to use CUB's memory management while still using the vecmem:: container classes. And that's only possible through the memory resources. While the allocators themselves don't mean anything to the vecmem:: containers. That's why we're both suggesting not to make them visible in the public interface of vecmem::cuda.

As for the various names: For the memory resources I would prefer the names that I wrote before. The private code is a bit less important. Though putting everything under vecmem::cuda::notcub and possibly vecmem::cuda::notcub::details would be my preference. (The PR at the moment also uses the vecmem::cuda::allocator namespace. I would hide those classes under vecmem::cuda::notcub::details instead, in the src/ directory.)

@cgleggett
Copy link
Author

where would you then suggest placing notcub so that it could be both used withing vecmem, and also within acts / traccc?

@krasznaa
Copy link
Member

In order to use the "notcub" code, I imagine that we'll be doing the following in traccc:

vecmem::cuda::notcub::device_memory_resource device_memory;
vecmem::vector< traccc::cell > simple_vector( &device_memory );

So basically everywhere where I use vecmem::host_memory_resource in acts-project/traccc#23 at the moment, you would have the ability to use one of the vecmem::cuda::notcub memory resources instead. And then compare them with using vecmem::cuda::device_memory_resource directly, or by using something like:

vecmem::cuda::device_memory_resource device_bare_memory;
vecmem::binary_page_memory_resource device_memory( device_bare_memory );
vecmem::vector< traccc::cell > simple_vector( &device_memory );

Or even:

vecmem::cuda::device_memory_resource device_bare_memory;
std::pmr::synchronized_pool_resource device_memory( &device_bare_memory );
vecmem::vector< traccc::cell > simple_vector( &device_memory );

It is in this way that we can switch out which type of memory management to use in a piece of code. We don't even need to recompile the code necessarily for the different tests. We can just provide a single compiled piece of code with different vecmem::memory_resource objects at runtime.

@cgleggett
Copy link
Author

The point I'm trying to make is that I would like to be able to use the notcub allocator both within and without vecmem. We can't do the latter if all the interfaces are private. For example, I have a version of the ACTS CUDA seedfinder that uses it, where I have it integrated into ACTS, but it would be nice to have it elsewhere.

@krasznaa
Copy link
Member

🤔 I really want to avoid doing this...

The language specific libraries in VecMem all only privately use their respective languages.

With the idea being that these libraries should be usable without exposing the user to the underlying language directly. I really want to keep this design...

So the sort of comparison that you have in mind will have to be done somehow differently. VecMem should only provide memory management using the C++17 memory resource UI. It's not meant to provide solutions that only allow code to be (user) compiled using one specific language.

@cgleggett
Copy link
Author

then where would you suggest putting it? We could put it in ACTS, but then there would be a dependency loop for vecmem. A separate repository for "externals" that ACTS and vecmem can use? I'm not a fan of proliferating the number of repositories.

@krasznaa
Copy link
Member

I suggest that you put that code in your own fork of Acts for now. Comparing the performance of that code with VecMem backed code through some extra machinery. (Which will be necessary in that case, since you won't be able to run the seed finding using a single build of the code, just switching out the memory manager. Possibly using a configuration parameter of the job. As you can with the code design that VecMem pushes for.)

We can have a stripped-down version of the CUB code "privately" in VecMem. I'm fully on board with that. But I just don't want to confuse the code's design even further by providing the ability, using vecmem::cuda, to use types like std::vector<Foo, vecmem::cuda::notcub::allocator>. As I wrote before, the idea is that with VecMem you would always use vecmem::vector<Foo> types in the host code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants