Skip to content

Conversation

@anujshuk-amd
Copy link
Contributor

@anujshuk-amd anujshuk-amd commented Jan 8, 2026

Using : ROCm/timemory#21
Documentation: Update DS for Thread storage - AGS ML SW SDK Team - Confluence

Motivation

The conversion from std::array to std::vector with dynamic growth addresses several key limitations:

Eliminates Fixed Thread Limits: The original std::array implementation imposed a hard limit of 4096 threads. Applications exceeding this limit would crash or exhibit undefined behavior.

Reduces Memory Footprint: With std::array, every storage instance allocated memory for 4096 thread slots upfront, regardless of actual thread count. The std::vector approach starts with the configured capacity and only grows when needed.

Enables Scalability: Modern HPC and GPU workloads can spawn thousands of threads. The geometric growth strategy (doubling capacity) allows the system to adapt to actual thread usage patterns without recompilation.

Maintains Performance: The implementation uses double-checked locking with atomics for the fast path. Most accesses only read an atomic variable without acquiring locks, preserving the performance characteristics of the fixed array while adding flexibility.

Thread Safety: The std::atomic<size_t> capacity tracking with memory_order_acquire/release semantics ensures thread-safe reads, while std::mutex protects the rare resize operations when capacity expansion is needed.

This pattern is applied consistently across both the timemory library (types.hpp) and rocprofiler-systems (counters.hpp) to provide uniform dynamic thread storage throughout the profiling stack.

Technical Details

This pull request refactors the thread-local storage mechanism for counter_data_tracker in the rocprofiler SDK, improving scalability and thread safety. The main change is switching from a fixed-size array to a dynamically resizing vector, allowing for more flexible thread support and safer concurrent access.

Thread-local storage scalability and safety

  • Changed storage_array_t from a fixed-size std::array to a dynamically resizing std::vector, enabling support for more threads beyond the initial maximum and removing the need for a hard limit.
  • Added a thread-safe capacity management system using std::atomic<size_t> and a std::mutex, with a new ensure_capacity method to handle geometric growth of the vector when higher thread indices are accessed.
  • Updated the assignment operator to call ensure_capacity before accessing the storage vector, ensuring safe expansion and avoiding out-of-bounds errors.
  • Modified the bulk assignment operator to use std::fill for compatibility with std::vector.
  • In get_storage, added a thread-safe check using the atomic capacity to prevent out-of-bounds access and return nullptr for invalid indices.

Submodule update

  • Updated the timemory submodule to a newer commit, likely to incorporate upstream improvements or fixes.

JIRA ID

TBA

Test Plan

TBA

Test Result

TBA

Submission Checklist

@anujshuk-amd anujshuk-amd self-assigned this Jan 8, 2026
@anujshuk-amd anujshuk-amd changed the title Improvement Convert thread storage from std::array to std::vector with dynamic growth Improvement thread storage from std::array to std::vector with dynamic growth Jan 8, 2026
@anujshuk-amd anujshuk-amd force-pushed the anujshuk/anujshuk-amd/dynamic-thread-storage branch 2 times, most recently from 4daacdc to 40f9b5e Compare January 8, 2026 19:56
Comment on lines 144 to 148
static std::atomic<size_t>& get_capacity()
{
static std::atomic<size_t> _cap{ max_threads };
return _cap;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just have static member instead of static variable inside function?
It would be great to avoid this style:
get_capacity().load(std::memory_order_acquire)
get_capacity().store(_v.size(), std::memory_order_release);

Because it's introducing confusion with get_* and then we are calling store, which is setting. It's contradictory.

With static member, we can directly call:
m_capacity.load(std::memory_order_acquire)
m_capacity.store(_v.size(), std::memory_order_release);

Which will be much readable.

@anujshuk-amd anujshuk-amd force-pushed the anujshuk/anujshuk-amd/dynamic-thread-storage branch 3 times, most recently from 40f9b5e to 271064c Compare January 14, 2026 19:39
…owth

- Replace fixed std::array with std::vector in counters.hpp
- Implement thread-safe dynamic resizing with geometric growth (2x)
- Add ensure_capacity() with double-checked locking pattern
- Use std::atomic<size_t> for lock-free capacity reads
- Add bounds checking in get_storage operation
- Initial capacity set to 4096, grows as needed
- Update timemory submodule to users/anujshuk/dynamic-thread-storage_ds (696a160d)
@anujshuk-amd anujshuk-amd force-pushed the anujshuk/anujshuk-amd/dynamic-thread-storage branch from 271064c to 1359274 Compare January 14, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants