Improvement thread storage from std::array to std::vector with dynamic growth #2542
+29
−13
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Using : ROCm/timemory#21
Documentation: Update DS for Thread storage - AGS ML SW SDK Team - Confluence
Motivation
The conversion from std::array to std::vector with dynamic growth addresses several key limitations:
Eliminates Fixed Thread Limits: The original std::array implementation imposed a hard limit of 4096 threads. Applications exceeding this limit would crash or exhibit undefined behavior.
Reduces Memory Footprint: With std::array, every storage instance allocated memory for 4096 thread slots upfront, regardless of actual thread count. The std::vector approach starts with the configured capacity and only grows when needed.
Enables Scalability: Modern HPC and GPU workloads can spawn thousands of threads. The geometric growth strategy (doubling capacity) allows the system to adapt to actual thread usage patterns without recompilation.
Maintains Performance: The implementation uses double-checked locking with atomics for the fast path. Most accesses only read an atomic variable without acquiring locks, preserving the performance characteristics of the fixed array while adding flexibility.
Thread Safety: The std::atomic<size_t> capacity tracking with memory_order_acquire/release semantics ensures thread-safe reads, while std::mutex protects the rare resize operations when capacity expansion is needed.
This pattern is applied consistently across both the timemory library (types.hpp) and rocprofiler-systems (counters.hpp) to provide uniform dynamic thread storage throughout the profiling stack.
Technical Details
This pull request refactors the thread-local storage mechanism for
counter_data_trackerin the rocprofiler SDK, improving scalability and thread safety. The main change is switching from a fixed-size array to a dynamically resizing vector, allowing for more flexible thread support and safer concurrent access.Thread-local storage scalability and safety
storage_array_tfrom a fixed-sizestd::arrayto a dynamically resizingstd::vector, enabling support for more threads beyond the initial maximum and removing the need for a hard limit.std::atomic<size_t>and astd::mutex, with a newensure_capacitymethod to handle geometric growth of the vector when higher thread indices are accessed.ensure_capacitybefore accessing the storage vector, ensuring safe expansion and avoiding out-of-bounds errors.std::fillfor compatibility withstd::vector.get_storage, added a thread-safe check using the atomic capacity to prevent out-of-bounds access and returnnullptrfor invalid indices.Submodule update
timemorysubmodule to a newer commit, likely to incorporate upstream improvements or fixes.JIRA ID
TBA
Test Plan
TBA
Test Result
TBA
Submission Checklist