Designing GPU-friendly jagged arrays #40

stephenswat · 2021-03-10T11:58:01Z

stephenswat
Mar 10, 2021
Collaborator

So, jagged arrays. Or rather, jagged vectors. They're the elephant in the room right now, and we will need to design an efficient an ergonomic interface for them if we are to make vecmem useful for the ACTS project. So, this is a brain dump about the design space of jagged vectors. I'll talk about the different design dimensions we can explore here. I'll be using the term row throughout this post to indicate the nested vectors.

Insertion patterns

The most important thing for jagged vectors is the insertion order of elements. I think we can distinguish three general categories for this dimension:

Hard in-order insertion: this is when the elements are inserted into the jagged vector strictly in row order. That is to say, each row is filled completely before the next row is started.
Soft in-order insertion: this is when the elements are inserted roughly in-order, but there might be a few exceptions here and there.
Out-of-order insertion: this is when there is no meaningful order to the insertion pattern.

Why does this matter? Well, it's a trade-off between effort (and compute time) on the user end versus the library end. If the insertion order is hard in-order, designing the container is very simple. However, we might then be offloading the job of inserting the elements in that order to the user, increasing work and compute time on their end. This solution is ideal if the data is already in such an order, but we might not want to put the onus of sorting the data on the user. Soft in-order is a little more complicated to implement, but not terribly so because we can ignore the performance overhead of out-of-order elements; they should be rare enough that they won't impact us much. Finally, out-of-order insertion is the most complex for the back-end, but makes programming for the user as easy as possible, since they can insert the elements in whatever order they want.

Access patterns

I think this one is quite clear cut: rows will usually be accessed in order. That makes sense from a parallelism point of view. We need to design our vectors in such a way that they can efficiently support this access pattern, by keeping rows contiguous as much as possible.

Baked versus unbaked vectors

This is relevant if we support out of order insertion. We can take two approaches here:

Baking the jagged vector: if the vector elements are inserted out of order, we have the option of adding a baking method which reconstructs the vector in an efficient, in-order format.
Not baking the vector: this forces us to deal with out-of-order elements at access time.

Baking the vector obviously incurs a little overhead but might be worth it if the runtime of accessing members of the vector is much higher than the time spent constructing it.

Internal ordering

Does it matter if the columns and rows are kept in-order? That is to say, should the elements be accessed in the same order they are inserted? If they need only be set-like, we might be able to simplify the code a little bit, since we don't need to keep the order consistent.

Row headers

I can imagine we might want to keep a little bit of additional information about each header at the start of it, such as a module ID. We might want to take this into account when designing our internal data structures.

Relying on the standard containers

Right now, we rely quite heavily on standard library containers with a different back-end for the memory management. This works well for one-dimensional data, but it might not be ideal for two-dimensional data where we have two layers of standard containers, since we can only transparently abstract away one layer on the memory management layer. We'd have to preprocess the vector to turn a vecmem::vector<vecmem::vector<T>> into a more usable format. The question is, how much would this preprocessing step cost.

Memory layout

This is where everything comes together, and the requirements in terms of insertion and access patterns come into play here. I see a few possibilities here, but this is definitely not a definitive list:

Non-row-contiguous linked list: we can store each element in insertion order, and link them to the next element in the same row. This makes insertion very cheap, but might make access quite expensive.
Row-contiguous with dynamic resizing: we can use a std::vector like mechanism of inserting elements in contiguous blocks, and then moving that block to a larger area when we run out of space. This can ensure that rows are contiguous for locality, but it makes insertion a little more expensive.
Self-balancing search tree: we can use a dynamically resized self-balancing search tree to store the elements in order. This strikes a balance between insertion cost and access cost.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Designing GPU-friendly jagged arrays #40

{{title}}

Replies: 0 comments

Select a reply

Designing GPU-friendly jagged arrays #40

stephenswat Mar 10, 2021 Collaborator

Insertion patterns

Access patterns

Baked versus unbaked vectors

Internal ordering

Row headers

Relying on the standard containers

Memory layout

Replies: 0 comments

stephenswat
Mar 10, 2021
Collaborator