Skip to content

Storage abstraction layer #3397

@0xmzk

Description

@0xmzk

Proposed refactoring or deprecation

Introduce a real storage abstraction layer for Aim’s backend storage, so that the core repository/query layer is not tightly coupled to the current RocksDB-based implementation.

The goal is not to remove RocksDB or force a different default. The goal is to make the storage architecture extensible enough that the core maintainers can continue using and supporting RocksDB, while the community can implement alternative backends where needed.

Motivation

A number of existing issues suggest that the current storage architecture is creating operational and scaling pain, while also making it difficult for users to adopt alternative backends without forking or large internal changes.

Relevant issues include:

There are also related requests for storage pluggability on the artifact/object side:

In my own investigation of a Too many open files failure, the problem did not appear to be a simple OS limit issue. One Aim worker had still opened roughly 1000 regular files, most of them .sst files under .aim/meta/chunks/*.

From reading the code, this appears to be tied to the current architecture:

  • there is a generic container interface in aim/storage/container.py
  • but Repo still directly imports and instantiates RocksContainer / RocksUnionContainer
  • run storage is bound to chunk-local trees
  • union read paths enumerate and open chunk DBs
  • max_open_files=-1 is set in the RocksDB container implementation

Taken together, this suggests that RocksDB is not just the default backend, but a core architectural assumption in the current implementation.

Pitch

I would like to propose introducing a proper abstraction boundary above the current low-level container API, at the repository/storage-factory level.

Concretely, this would ideally mean:

  • Repo and higher-level query/storage paths depend on a backend interface rather than directly on Rocks-specific classes
  • RocksDB remains a first-class default backend
  • users are not forced into the current chunk-local RocksDB layout as the only practical architecture
  • the community can implement alternative backends for their own needs without requiring the maintainers to replace the default storage engine
  • the same architectural principle can be applied to object/artifact storage as well

This would let the core maintainers keep the current storage model where it works well, while making Aim more adaptable for users whose workloads would benefit from a different backend.

Additional context

Files that seem especially relevant:

  • aim/storage/container.py
  • aim/storage/rockscontainer.pyx
  • aim/storage/union.pyx
  • aim/sdk/repo.py
  • aim/sdk/base_run.py
  • aim/sdk/index_manager.py

Relevant code observations:

  • aim/storage/container.py provides a generic storage interface
  • aim/sdk/repo.py directly imports and constructs RocksContainer / RocksUnionContainer
  • aim/sdk/base_run.py binds run data to chunk-local trees under meta/chunks/<run> and seqs/chunks/<run>
  • aim/storage/union.pyx enumerates and opens chunk databases for read access
  • aim/storage/rockscontainer.pyx sets max_open_files=-1

I think a real abstraction layer here would be valuable even if no new backend ships immediately, because it would reduce coupling and make future storage work much easier for both maintainers and the community.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type / code-healthIssue type: suggest a code improvement, i.e. refactoring, deprecation, etc

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions