Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve index memory management #73

Open
albe opened this issue Sep 22, 2019 · 3 comments
Open

Improve index memory management #73

albe opened this issue Sep 22, 2019 · 3 comments
Labels
enhancement P: Index Affects the indexing layer

Comments

@albe
Copy link
Owner

albe commented Sep 22, 2019

Currently an index preallocates an array of the length of the index and then fills it with data on demand. Since the internal array is untyped, memory is allocated just for the structure. Still, for large indices this means a whole lot of memory being kept allocated unused in a couple of use cases, most notably the write-only scenario.

Optimally an index would only keep a (configurable) fixed upper bound of data in memory, then fill that on demand.
This could be a good scenario for a ring buffer.

Also, optimally the internal array buffer would be a typed array of (u)int32 to have index data in a contiguous memory block. This could potentially optimize index entry/buffer translation since the entry would just be a typed view on the underlying buffer and no copying on read would be involved.

@albe albe added the P: Index Affects the indexing layer label Oct 5, 2019
@albe
Copy link
Owner Author

albe commented Oct 5, 2019

Index range reading should return a generator and work with the same semantics as partition reading. Hence, the underlying file abstraction could be shared between index and partition.

@albe
Copy link
Owner Author

albe commented Aug 20, 2020

@albe
Copy link
Owner Author

albe commented Aug 22, 2020

The feedback on nodejs/help repo suggests this is to be expected, as typed arrays do a bit more.
After some testing with creating a custom implementation of a "buffer view entry" the single access use case is faster than the current implementation by a factor of 2, but slower by a factor of two when a second access happens (and hence likely even more for further accesses - i.e. it behaves bad for "cache hits" in the index reader). So the buffer reads need to be memoized eagerly (lazy adds a condition in the access path).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement P: Index Affects the indexing layer
Projects
None yet
Development

No branches or pull requests

1 participant