Improvement thread storage from std::array to std::vector with dynamic growth #2542

anujshuk-amd · 2026-01-08T18:58:54Z

Using : ROCm/timemory#21
Documentation: Update DS for Thread storage - AGS ML SW SDK Team - Confluence

Motivation

The conversion from std::array to std::vector with dynamic growth addresses several key limitations:

Eliminates Fixed Thread Limits: The original std::array implementation imposed a hard limit of 4096 threads. Applications exceeding this limit would crash or exhibit undefined behavior.

Reduces Memory Footprint: With std::array, every storage instance allocated memory for 4096 thread slots upfront, regardless of actual thread count. The std::vector approach starts with the configured capacity and only grows when needed.

Enables Scalability: Modern HPC and GPU workloads can spawn thousands of threads. The geometric growth strategy (doubling capacity) allows the system to adapt to actual thread usage patterns without recompilation.

Maintains Performance: The implementation uses double-checked locking with atomics for the fast path. Most accesses only read an atomic variable without acquiring locks, preserving the performance characteristics of the fixed array while adding flexibility.

Thread Safety: The std::atomic<size_t> capacity tracking with memory_order_acquire/release semantics ensures thread-safe reads, while std::mutex protects the rare resize operations when capacity expansion is needed.

This pattern is applied consistently across both the timemory library (types.hpp) and rocprofiler-systems (counters.hpp) to provide uniform dynamic thread storage throughout the profiling stack.

Technical Details

This pull request refactors the thread-local storage mechanism for counter_data_tracker in the rocprofiler SDK, improving scalability and thread safety. The main change is switching from a fixed-size array to a dynamically resizing vector, allowing for more flexible thread support and safer concurrent access.

Thread-local storage scalability and safety

Changed storage_array_t from a fixed-size std::array to a dynamically resizing std::vector, enabling support for more threads beyond the initial maximum and removing the need for a hard limit.
Added a thread-safe capacity management system using std::atomic<size_t> and a std::mutex, with a new ensure_capacity method to handle geometric growth of the vector when higher thread indices are accessed.
Updated the assignment operator to call ensure_capacity before accessing the storage vector, ensuring safe expansion and avoiding out-of-bounds errors.
Modified the bulk assignment operator to use std::fill for compatibility with std::vector.
In get_storage, added a thread-safe check using the atomic capacity to prevent out-of-bounds access and return nullptr for invalid indices.

Submodule update

Updated the timemory submodule to a newer commit, likely to incorporate upstream improvements or fixes.

JIRA ID

TBA

Test Plan

TBA

Test Result

TBA

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…owth

mradosav-amd · 2026-01-12T10:13:28Z

projects/rocprofiler-systems/source/lib/rocprof-sys/library/rocprofiler-sdk/counters.hpp

+    static std::atomic<size_t>& get_capacity()
+    {
+        static std::atomic<size_t> _cap{ max_threads };
+        return _cap;
+    }


Can we just have static member instead of static variable inside function?
It would be great to avoid this style:
get_capacity().load(std::memory_order_acquire)
get_capacity().store(_v.size(), std::memory_order_release);

Because it's introducing confusion with get_* and then we are calling store, which is setting. It's contradictory.

With static member, we can directly call:
m_capacity.load(std::memory_order_acquire)
m_capacity.store(_v.size(), std::memory_order_release);

Which will be much readable.

…owth - Replace fixed std::array with std::vector in counters.hpp - Implement thread-safe dynamic resizing with geometric growth (2x) - Add ensure_capacity() with double-checked locking pattern - Use std::atomic<size_t> for lock-free capacity reads - Add bounds checking in get_storage operation - Initial capacity set to 4096, grows as needed - Update timemory submodule to users/anujshuk/dynamic-thread-storage_ds (696a160d)

Convert thread storage from std::array to std::vector with dynamic gr…

3971d01

…owth

anujshuk-amd self-assigned this Jan 8, 2026

github-actions bot added the project: rocprofiler-systems label Jan 8, 2026

anujshuk-amd changed the title ~~Improvement Convert thread storage from std::array to std::vector with dynamic growth~~ Improvement thread storage from std::array to std::vector with dynamic growth Jan 8, 2026

anujshuk-amd mentioned this pull request Jan 8, 2026

Convert thread storage from std::array to std::vector with dynamic growth ROCm/timemory#21

Draft

1 task

systems-assistant bot added the organization: ROCm label Jan 8, 2026

anujshuk-amd force-pushed the anujshuk/anujshuk-amd/dynamic-thread-storage branch 2 times, most recently from 4daacdc to 40f9b5e Compare January 8, 2026 19:56

mradosav-amd reviewed Jan 12, 2026

View reviewed changes

anujshuk-amd force-pushed the anujshuk/anujshuk-amd/dynamic-thread-storage branch 3 times, most recently from 40f9b5e to 271064c Compare January 14, 2026 19:39

anujshuk-amd force-pushed the anujshuk/anujshuk-amd/dynamic-thread-storage branch from 271064c to 1359274 Compare January 14, 2026 19:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improvement thread storage from std::array to std::vector with dynamic growth #2542

Improvement thread storage from std::array to std::vector with dynamic growth #2542

anujshuk-amd commented Jan 8, 2026 •

edited

Loading

Uh oh!

mradosav-amd Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improvement thread storage from std::array to std::vector with dynamic growth #2542

Are you sure you want to change the base?

Improvement thread storage from std::array to std::vector with dynamic growth #2542

Conversation

anujshuk-amd commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Thread-local storage scalability and safety

Submodule update

JIRA ID

Test Plan

Test Result

Submission Checklist

Uh oh!

mradosav-amd Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anujshuk-amd commented Jan 8, 2026 •

edited

Loading