Designing GPU-friendly jagged arrays #40
stephenswat
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So, jagged arrays. Or rather, jagged vectors. They're the elephant in the room right now, and we will need to design an efficient an ergonomic interface for them if we are to make
vecmem
useful for the ACTS project. So, this is a brain dump about the design space of jagged vectors. I'll talk about the different design dimensions we can explore here. I'll be using the term row throughout this post to indicate the nested vectors.Insertion patterns
The most important thing for jagged vectors is the insertion order of elements. I think we can distinguish three general categories for this dimension:
Why does this matter? Well, it's a trade-off between effort (and compute time) on the user end versus the library end. If the insertion order is hard in-order, designing the container is very simple. However, we might then be offloading the job of inserting the elements in that order to the user, increasing work and compute time on their end. This solution is ideal if the data is already in such an order, but we might not want to put the onus of sorting the data on the user. Soft in-order is a little more complicated to implement, but not terribly so because we can ignore the performance overhead of out-of-order elements; they should be rare enough that they won't impact us much. Finally, out-of-order insertion is the most complex for the back-end, but makes programming for the user as easy as possible, since they can insert the elements in whatever order they want.
Access patterns
I think this one is quite clear cut: rows will usually be accessed in order. That makes sense from a parallelism point of view. We need to design our vectors in such a way that they can efficiently support this access pattern, by keeping rows contiguous as much as possible.
Baked versus unbaked vectors
This is relevant if we support out of order insertion. We can take two approaches here:
Baking the vector obviously incurs a little overhead but might be worth it if the runtime of accessing members of the vector is much higher than the time spent constructing it.
Internal ordering
Does it matter if the columns and rows are kept in-order? That is to say, should the elements be accessed in the same order they are inserted? If they need only be set-like, we might be able to simplify the code a little bit, since we don't need to keep the order consistent.
Row headers
I can imagine we might want to keep a little bit of additional information about each header at the start of it, such as a module ID. We might want to take this into account when designing our internal data structures.
Relying on the standard containers
Right now, we rely quite heavily on standard library containers with a different back-end for the memory management. This works well for one-dimensional data, but it might not be ideal for two-dimensional data where we have two layers of standard containers, since we can only transparently abstract away one layer on the memory management layer. We'd have to preprocess the vector to turn a
vecmem::vector<vecmem::vector<T>>
into a more usable format. The question is, how much would this preprocessing step cost.Memory layout
This is where everything comes together, and the requirements in terms of insertion and access patterns come into play here. I see a few possibilities here, but this is definitely not a definitive list:
std::vector
like mechanism of inserting elements in contiguous blocks, and then moving that block to a larger area when we run out of space. This can ensure that rows are contiguous for locality, but it makes insertion a little more expensive.Beta Was this translation helpful? Give feedback.
All reactions