Skip to content

Orca - ORigin CAche#124

Draft
plombardi89 wants to merge 10 commits intomainfrom
phlombar/origincache
Draft

Orca - ORigin CAche#124
plombardi89 wants to merge 10 commits intomainfrom
phlombar/origincache

Conversation

@plombardi89
Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread design/orca/design.md
Comment thread design/orca/design.md
breaker** (s10.2). Sustained `ErrAuth` (default 3 consecutive) flips
`/readyz` to NotReady so load balancers drain the replica.

### 8.7 Metadata-layer singleflight
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm reading this right the doc proposes consistent hashing of the file/blob keyspace across Orca replicas, plus in-memory pipelining to batch concurrent requests for the same blocks.

Any concerns about hotspots if we're only running a few replicas? Since we have shared storage under the cache it's feasible to avoid partitioning (at the cost of another network roundtrip for locking)

Comment thread design/orca/design.md
access-frequency-driven eviction loop. This is the recommended
posixfs path when external sweep tooling is impractical.

### 13.2 Active eviction (opt-in, access-frequency)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we definitely need eviction? I assume the remote dataset might be larger than in-region storage, but worth confirming

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @jwilder has mentioned that the origin data set is going to be bigger than the origin cache data store can store at least some of the times.

It feels like we need some kind of eviction but maybe we need more clarity on the requirement

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, this approach checks out then. Being able to keep the counters in memory might justify sticking with consistent hashing even if it comes at the cost of some hotspotting across replicas. So I guess that mostly addresses both of my remaining comments. 🙃

As long as we run fewer, "larger" replicas of this service I think we're good to go 👍

Add the orca origin cache binary (cmd/orca, internal/orca) that fronts
S3 / Azure Blob origins with a chunked, rendezvous-hashed, in-DC cache
backed by an S3-compatible store (LocalStack in dev, VAST or similar in
production). Includes:

- Per-replica fetch coordinator with cluster-wide singleflight collapse
  via rendezvous-hash coordinator selection plus an /internal/fill RPC
  for cross-replica fan-in.
- cluster.PeerSource abstraction (DNS-backed in production, mutable
  StaticPeerSource for tests) with Peer.Port to support multiple
  replicas sharing an IP under test.
- internal/orca/app factory exposing Start/Shutdown plus options for
  injecting alternate origin / cachestore / peer-source / internal
  handler wrap (used by tests).
- Integration suite (internal/orca/inttest, build tag integrationtest)
  driven by testcontainers-go: 7 scenarios against real LocalStack +
  Azurite covering cold/warm GET, ranged GET with chunk-boundary edge
  cases, 64-chunk multi-chunk GET, rendezvous routing, singleflight
  collapse with 64 <= origin GetRanges <= 76 bound, and a real
  membership-disagreement fallback test.
- Unit tests covering driver branches (cachestore versioning gate,
  azureblob blob-type gate, server error mapping + handler XML +
  range parser + path split + headers, chunk arithmetic + path
  determinism, config env-var fallback, manifest YAML validity).
- Deploy manifest templates (deploy/orca/) defaulting to the
  unbounded-kube namespace, and an extracted reusable
  hack/cmd/render-manifests/render package consumed by both the CLI
  and the manifest validity test.

Adds make orca-inttest target and a parallel CI job. Docs for the
dev harness and integration suite are intentionally excluded from
this commit.
in := &s3.ListObjectsV2Input{
Bucket: aws.String(b),
Prefix: aws.String(prefix),
MaxKeys: aws.Int32(int32(maxResults)),
}

cc := a.client.ServiceClient().NewContainerClient(cName)
max := int32(maxResults)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants