Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache component instances on a Motion server, within a memory budget #265

Open
shreyashankar opened this issue Nov 30, 2023 · 4 comments
Open
Assignees
Labels
research XL Xtra large task, > 1 week

Comments

@shreyashankar
Copy link
Collaborator

Currently component instances are flushed and shut down, but we may benefit from keeping instances alive because the state won't have to load again. Some things to consider:

  • There's a budget for what we can keep in memory
  • If we keep a component in memory for too long, it can become stale

For the first bullet, we can write some algorithm that prioritizes keeping large & frequently-accessed components in memory.

For the second bullet, we can check our version against the latest version in Redis, and if the component is stale, we can refresh it.

@shreyashankar shreyashankar added the L Large task, maybe somewhat dreading (a week and/or refactor) label Nov 30, 2023
@shreyashankar shreyashankar self-assigned this Nov 30, 2023
@shreyashankar
Copy link
Collaborator Author

shreyashankar commented Nov 30, 2023

Algorithm Description

Consider the following:

  • Keep a vector S of scores, one score for each component instance. Scores start out as 0.
  • Keep a vector T of saved load times, one for each component instance.

On request for component instance ci:

Initialize S (scores) and T (saved load times) for each component instance.
S = [0, 0, ..., 0] # Scores start as 0.
T = [t1, t2, ..., tn] # Initial saved load times.

Procedure RequestComponent(ci, cache)
    If ci is in cache
        S[ci] = S[ci] + T[ci] # Cache hit, increment score.
    Else
        S[ci] = S[ci] - 1 / T[ci] # Cache miss, decrement score.
        If HasRoomFor(ci, cache)
            AddToCache(ci, cache)
        Else
            ConsiderAddingToCache(ci, cache)
    EndIf

Procedure HasRoomFor(ci, cache)
    return sum(Size(c) for c in cache) + Size(ci) <= cache_size

Procedure AddToCache(ci, cache)
    cache.Add(ci)

Procedure ConsiderAddingToCache(ci, cache)
    sorted_cache = Sort(cache, S, Size)
    For each subset in AllSubsets(sorted_cache)
        If S[ci] > sum(S[c] for c in subset) and Size(ci) < sum(Size(c) for c in subset)
            ReplaceSubsetWith(ci, subset, cache)
            break
        EndIf
    EndFor

Procedure ReplaceSubsetWith(ci, subset, cache)
    For each c in subset
        cache.Remove(c)
    EndFor
    cache.Add(ci)
  • if ci is in the cache, increment its score by T[ci] and return.
  • if ci is not in the cache, decrement its score by 1/T[ci].
  • if the cache has room for ci, add and return.
  • add ci to the cache with probability proportional to its score and size. Sort the cache by scores and sizes. If there is some subset C in the cache such that S[ci] > sum(S[c] for c in C) and size(ci) < sum(size(c) for c in C), then replace C with ci.
  • return.

@shreyashankar
Copy link
Collaborator Author

Worst Case Analysis

The worst-case scenario for this algorithm occurs when there is a frequent alternation between requests for different sets of components, where each set of components is collectively larger than the cache size. This situation would force the algorithm to continuously evict and load components, leading to high load times.

Here are some conditions for worst case scenarios:

  1. Frequent Access Pattern Changes: The request pattern frequently changes such that the components recently loaded into the cache are no longer requested, and a new set of components (which are not in the cache) become the focus of incoming requests.
  2. Large Component Size Relative to Cache: Each component, or the collective size of frequently requested components, is large relative to the cache size. This requires frequent evictions to make room for new components.

@shreyashankar
Copy link
Collaborator Author

Actually, we should implement some variant of AdaptSize: https://www.usenix.org/system/files/conference/nsdi17/nsdi17-berger.pdf

@shreyashankar
Copy link
Collaborator Author

shreyashankar commented Dec 1, 2023

Ok, after reading the paper, we should just implement AdaptSize. Given ci on request, we:

  • admit ci with probability $e^{-size/c}$
  • evict ci with LRU policy (pick the subset of least recently used component instances that make room for ci)

The question becomes how to pick c. I can't really follow the code in the paper.

@shreyashankar shreyashankar added XL Xtra large task, > 1 week and removed L Large task, maybe somewhat dreading (a week and/or refactor) labels Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
research XL Xtra large task, > 1 week
Projects
None yet
Development

No branches or pull requests

1 participant