Skip to content

Split pending chunks in the ResourceAwarePool? #1020

@kab163

Description

@kab163

In the case where a pending chunk is reused in the allocate_resource function, further on in the function, the split_chunk used to be marked as free and added to the free map, BUT the some resource is still using that memory. We don't want it to be marked free and added to the free map because if someone allocates from the free map and gets that split_chunk before the GPU completes, you have a data race - both the old GPU operation and the new allocation are accessing the same memory. So, we decided to add the "was_pending" boolean so that we could track this and then do something when splitting the chunk to avoid problems...

So one solution may be just avoiding the splitting of a reused pending chunk all together:

142   if ((rounded_bytes != chunk->size) && !was_pending) { // Don't split a pending chunk that's being reused
.... // all the normal split logic
160   } 

but that could lead to fragmentation, or split the reused pending chunk but add it to the pending_map:

142   if (rounded_bytes != chunk->size) {
.....       // all the normal split logic, BUT:
159     // We are actually splitting up a pending chunk that is being reused, so split chunk is pending too
160     if (was_pending) {
161       split_chunk->pending_map_it = m_pending_map.insert({std::optional<Resource>(r), split_chunk});
162       split_chunk->free = false;
163     } else { // Chunk we are splitting up is not pending, so split_chunk is free
164       split_chunk->size_map_it = m_free_map.insert(std::make_pair(remaining, split_chunk));
165       split_chunk->free = true;
166     }

but now we need to consider whether some things we never needed to before since this is a new state. For example:

Maybe the split chunk should inherit the event from the original chunk:

if (was_pending) {
    split_chunk->pending_map_it = m_pending_map.insert({std::optional<Resource>(r), split_chunk});
    split_chunk->free = false;
    split_chunk->event = chunk->event;  // Add this - both chunks share the same event
}

If both the original chunk and the split chunk were part of the same GPU operation, they need the same event to know when that operation completes - right?

From Claude:

The original chunk's event should also be preserved when you reuse it from pending. Currently you erase it from pending and set pending_map_it = end(), but make sure you're not accidentally clearing chunk->event. The event still needs to be checked if someone tries to deallocate before the GPU operation completes.

This issue is to track what happens with this splitting of a pending chunk. Once we get users we can reassess if this is an issue or not.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions