ref(cache): Move buffering of pending envelope to ProjectCache #1879

olksdr · 2023-02-24T10:37:56Z

These changes moving buffering of the incoming envelopes in the ProjectCache.

Current implementation still keeps, so called queue in memory and using HashMap with a composite key QueueKey {key, sampling_key}, where sampling_key can be the same as a key if there is no sampling project identified. The values to these keys are Vec of boxed Envelope with their EnvelopeContext.

Once we get an update for project state, we check all variants of QueueKey which contains the current ProjectKey and if all the project states are cached we try to flush buffered envelopes indexed by these QeueuKey.

The envelops will be buffered if:

the project state is not fetched yet
root project is here but the sampling project state is not fetched yet
the sampling project state is here but the root project is not fetched yet

This change also removes all the buffering from the Project and reduces its responsibility. Now it just keeps its own state and configuration and the envelope handling is done outside of it.

relay-server/src/actors/project_cache.rs

…queue

jjbayer

The logic seems sound to me, just some general questions / code suggestions. Let me know if I made any wrong assumptions.

relay-server/src/actors/project_cache.rs

jjbayer · 2023-02-28T11:47:00Z

relay-server/src/actors/project_cache.rs

+        if let Ok(CheckedEnvelope {
+            envelope: Some((envelope, envelope_context)),
+            ..
+        }) = self


Is it OK to swallow the error case here?

That's how this was done before, but we also can rethink this behaviour now.

relay-server/src/actors/project_cache.rs

jjbayer · 2023-02-28T12:16:34Z

relay-server/src/actors/project_cache.rs

+        for qkey in keys.drain() {
+            if f(&qkey) {
+                if let Some(envelopes) = self.buffer.remove(&qkey) {
+                    result.extend(envelopes);


It would be nice if this function could return an iterator of envelopes, something like

keys.drain().map(|qkey| { // ... envelopes.into_iter() }).flatten()

Just need to be careful to consume the iterator when calling dequeue before garbage disposal.

Still have to think about this a bit.

Maybe we can use drain_filter?

@iker-barriocanal that's a nightly feature, unfortunately.

This is not a blocker for the current PR, but generally speaking, if all you want to do is iterate over a sequence once, there is no reason to copy its data over to a new vector.

For our case specifically, once we change to disk spooling, we will need some form of batching anyway, because the dequeued data will be too large to copy into one big vector.

You're right.
And I also think it is not important for now but we definitely will have to change this behaviour once we start handling the disk access and might have many more envelopes in persistent buffer.

iker-barriocanal

I'm a bit confused with this PR, so mostly questions.

iker-barriocanal · 2023-03-02T09:38:10Z

relay-server/src/actors/project.rs

+#[cfg(feature = "processing")]
+use crate::actors::processor::EnvelopeProcessor;


Why is this only targeting processing relays?

We have the following in the code:

relay/relay-server/src/actors/project.rs

Lines 798 to 807 in 381c893

#[cfg(feature = "processing")]

if !was_rate_limited && config.processing_enabled() {

// If there were no cached rate limits active, let the processor check redis:

EnvelopeProcessor::from_registry().send(RateLimitFlushBuckets {

bucket_limiter,

partition_key,

});

return;

}

which already guards some of the code behind the feature flag, so the import must be there as well, otherwise the compiler will complain.

iker-barriocanal · 2023-03-02T10:23:06Z

relay-server/src/actors/project_cache.rs

+        self.index.entry(key.key).or_default().insert(key);
+        self.index.entry(key.sampling_key).or_default().insert(key);


I don't understand the purpose of another entry with key.sampling_key, why are we doing that?

iker-barriocanal · 2023-03-02T10:27:48Z

relay-server/src/actors/project_cache.rs

+        for qkey in keys.drain() {
+            if f(&qkey) {
+                if let Some(envelopes) = self.buffer.remove(&qkey) {
+                    result.extend(envelopes);


Maybe we can use drain_filter?

relay-server/src/actors/project_cache.rs

iker-barriocanal · 2023-03-02T10:38:46Z

relay-server/src/actors/project_cache.rs

+
+            // We return false if project is not cached or its state is invalid.
+            self.projects
+                .get(&queue_key.sampling_key)


Why queue_key.sampling_key, instead of queue_key.key?

relay-server/src/actors/project_cache.rs

jjbayer · 2023-03-02T15:21:54Z

relay-server/src/actors/project_cache.rs

+    {
+        let mut result = Vec::new();
+
+        let mut keys = self.index.remove(partial_key).unwrap_or_default();


Suggested change

let mut keys = self.index.remove(partial_key).unwrap_or_default();

let mut queue_keys = self.index.remove(partial_key).unwrap_or_default();

jjbayer · 2023-03-02T15:23:13Z

relay-server/src/actors/project_cache.rs

+    pub fn dequeue<P>(
+        &mut self,
+        partial_key: &ProjectKey,
+        f: P,


Suggested change

f: P,

predicate: P,

jjbayer · 2023-03-02T15:25:52Z

relay-server/src/actors/project_cache.rs

+        for qkey in keys.drain() {
+            if f(&qkey) {
+                if let Some(envelopes) = self.buffer.remove(&qkey) {
+                    result.extend(envelopes);


@iker-barriocanal that's a nightly feature, unfortunately.

relay-server/src/actors/project_cache.rs

jjbayer · 2023-03-02T16:16:44Z

relay-server/src/actors/project_cache.rs

+        let sampling_state = sampling_key.and_then(|key| {
+            self.get_or_create_project(key)
+                .get_cached_state(envelope.meta().no_cache())
+                .filter(|st| !st.invalid())
+        });


This can be moved inside if let Some(state) = .

Bumping this comment. There's no need to get_or_create_project the sampling state when the project_state is None.

In other places we use projects map directly, and from what I see this is the only one place now we call self.get_or_create_project(...).get_cached_state(...) which initiate the upstream request to update the sampling state if it's not in the cache.

So if the project_state is None we are most probably request the update for it, and now just queue the envelope, and at the same time we request already sampling state to be able to process those queued envelopes when state are updated.

But maybe I'm missing something.

Sorry, you are right. I overlooked that get_cached_state triggers a fetch from the upstream.

jjbayer

Please read the comments (especially the one about root_key). Nothing blocking though.

jjbayer · 2023-03-03T09:24:18Z

relay-server/src/actors/project_cache.rs

+        for qkey in keys.drain() {
+            if f(&qkey) {
+                if let Some(envelopes) = self.buffer.remove(&qkey) {
+                    result.extend(envelopes);


This is not a blocker for the current PR, but generally speaking, if all you want to do is iterate over a sequence once, there is no reason to copy its data over to a new vector.

For our case specifically, once we change to disk spooling, we will need some form of batching anyway, because the dequeued data will be too large to copy into one big vector.

jjbayer · 2023-03-03T10:50:47Z

relay-server/src/actors/project_cache.rs

@@ -374,13 +374,16 @@ struct UpdateProjectState {

 #[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Clone, Copy)]
 struct QueueKey {
-    key: ProjectKey,
+    root_key: ProjectKey,


I'm afraid root is too ambiguous here, because it's also used to refer to the "root" of the trace, which is the sampling project. I have no better name to be honest. Maybe own_key or project_key?

jjbayer · 2023-03-03T10:54:08Z

relay-server/src/actors/project_cache.rs


-            // We return false if project is not cached or its state is invalid.
+            // We return false if project is not cached or its state is invalid, true otherwise.


Suggested change

// We return false if project is not cached or its state is invalid, true otherwise.

// We return false if project is not cached or its state is invalid, true otherwise.

// We only have to check `partial_key`, because we already know that the `project_key`s `state` is valid and loaded.

jjbayer · 2023-03-03T11:30:29Z

relay-server/src/actors/project_cache.rs

+                .and_then(|key| self.projects.get(&key))
+                .and_then(|p| p.valid_state());
+
+            self.handle_processing(state.clone(), sampling_state, envelope, envelope_context)


Suggested change

self.handle_processing(state.clone(), sampling_state, envelope, envelope_context)

self.handle_processing(state.clone(), sampling_state, envelope, envelope_context);

jjbayer · 2023-03-03T11:38:44Z

relay-server/src/actors/project_cache.rs

+        let sampling_state = sampling_key.and_then(|key| {
+            self.get_or_create_project(key)
+                .get_cached_state(envelope.meta().no_cache())
+                .filter(|st| !st.invalid())
+        });


Bumping this comment. There's no need to get_or_create_project the sampling state when the project_state is None.

jjbayer · 2023-03-03T11:40:42Z

relay-server/src/actors/project_cache.rs

+        let ValidateEnvelope { envelope, context } = message;
+
+        // Fetch the project state for our key and make sure it's not invalid.
+        let root_key = envelope.meta().public_key();


See comment on ambiguity of "root project".

relay/relay-server/src/actors/processor.rs

Line 224 in e7c862e

/// Metrics associated with the sampling project (a.k.a. root or head project)

jan-auer · 2023-03-01T11:04:39Z

relay-server/src/actors/project_cache.rs

@@ -131,11 +133,12 @@ impl CheckEnvelope {
 /// [`CheckEnvelope`]. Once the envelope has been validated, remaining items are forwarded to the
 /// next stage:
 ///
-///  - If the envelope needs dynamic sampling, this sends [`AddSamplingState`] to the
-///    [`ProjectCache`] to add the required project state.
+///  - If the envelope needs dynamic sampling, and the project state is not cached or out of the


nit: We could rewrite parts of the paragraph before. It talks about stages, but now we just wait for both project states concurrently (or simultaneously) and then move on to CheckEnvelope.

jan-auer · 2023-03-01T11:05:18Z

relay-server/src/actors/project_cache.rs

@@ -400,6 +373,92 @@ struct UpdateProjectState {
    no_cache: bool,
 }

+#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Clone, Copy)]


cc @jjbayer, do we want to define the order of derives in the coding guideline?

@jan-auer what rule would you suggest? Derives are order-sensitive, so there are some limits (see rust-lang/rustfmt#1867 (comment)).

#1879)" This reverts commit 35f5d86.

…e" (#1906) Reverts #1879 Introduced regression: [RELAY-2NFR](https://sentry.my.sentry.io/organizations/sentry/issues/393031/events/7dae7f8e304e456ea5355a2169f3a075/?project=4) The project key on the ProjectState was missing.

ref(cache): Move buffering of pending envelope to ProjectCache

a6a4682

olksdr requested a review from jan-auer February 24, 2023 10:37

olksdr self-assigned this Feb 24, 2023

olksdr added 4 commits February 24, 2023 11:48

Fix lint

5fbe8a9

Merge branch 'master' into feat/project-cache-queue

5a9c421

Add CHANGELOG entry

1003234

Reuse check_envelope fn

d649845

jjbayer reviewed Feb 24, 2023

View reviewed changes

relay-server/src/actors/project_cache.rs Outdated Show resolved Hide resolved

relay-server/src/actors/project_cache.rs Outdated Show resolved Hide resolved

relay-server/src/actors/project_cache.rs Outdated Show resolved Hide resolved

olksdr added 5 commits February 25, 2023 09:28

Introduce the index

5ac4c61

Small cleanup and docs

c6554de

Merge remote-tracking branch 'origin/master' into feat/project-cache-…

716cf9a

…queue

Simplify taking keys out of the index

00c3c1e

Use hash set for index

5296068

olksdr marked this pull request as ready for review February 27, 2023 08:05

olksdr requested review from a team and jjbayer February 27, 2023 08:05

olksdr added 3 commits February 27, 2023 13:13

Refactor processing part into separate func

ceb590d

Dispose envelopes linked to the expired projects

4a1c670

Merge branch 'master' into feat/project-cache-queue

d4f1f8e

jjbayer requested changes Feb 28, 2023

View reviewed changes

olksdr and others added 9 commits March 1, 2023 07:37

Merge branch 'master' into feat/project-cache-queue

99d4df0

Switch to BTreeSet and address some review comments

435ac7b

ref: Simplify

cb2a1c5

ref: Changes from review

da7ebfb

Log only if there is anything to log

67c94f3

Change the condition to filter out invalid states

a143882

Pick envelopes for the incoming root project

40c2727

Merge branch 'master' into feat/project-cache-queue

561dcf0

simplified predicated, use projects map

03917a5

iker-barriocanal reviewed Mar 2, 2023

View reviewed changes

Fix dequeue and use the proper project key for processing.

2d2edfa

olksdr requested a review from jjbayer March 2, 2023 15:03

jjbayer reviewed Mar 2, 2023

View reviewed changes

olksdr added 3 commits March 2, 2023 17:31

Update comments

0f9b507

Merge branch 'master' into feat/project-cache-queue

46c14b5

review comments

bcac670

olksdr requested a review from jjbayer March 2, 2023 17:42

jjbayer approved these changes Mar 3, 2023

View reviewed changes

review comments

381c893

jan-auer approved these changes Mar 6, 2023

View reviewed changes

olksdr merged commit 35f5d86 into master Mar 6, 2023

olksdr deleted the feat/project-cache-queue branch March 6, 2023 14:09

olksdr added a commit that referenced this pull request Mar 6, 2023

Revert "ref(cache): Move buffering of pending envelope to ProjectCache (

0c66894

#1879)" This reverts commit 35f5d86.

olksdr mentioned this pull request Mar 6, 2023

Revert "ref(cache): Move buffering of pending envelope to ProjectCache" #1906

Merged

olksdr mentioned this pull request Mar 7, 2023

ref(cache): Move buffering of pending envelope to ProjectCache #1907

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref(cache): Move buffering of pending envelope to ProjectCache #1879

ref(cache): Move buffering of pending envelope to ProjectCache #1879

olksdr commented Feb 24, 2023 •

edited

Loading

jjbayer left a comment

jjbayer Feb 28, 2023

olksdr Mar 1, 2023

jjbayer Feb 28, 2023

olksdr Mar 1, 2023

iker-barriocanal Mar 2, 2023

jjbayer Mar 2, 2023

jjbayer Mar 3, 2023

olksdr Mar 3, 2023

iker-barriocanal left a comment

iker-barriocanal Mar 2, 2023

olksdr Mar 3, 2023

iker-barriocanal Mar 2, 2023

iker-barriocanal Mar 2, 2023

iker-barriocanal Mar 2, 2023

jjbayer Mar 2, 2023

jjbayer Mar 2, 2023

jjbayer Mar 2, 2023

jjbayer Mar 2, 2023

jjbayer Mar 3, 2023

olksdr Mar 3, 2023

jjbayer Mar 3, 2023

jjbayer left a comment

jjbayer Mar 3, 2023

jjbayer Mar 3, 2023

jjbayer Mar 3, 2023

jjbayer Mar 3, 2023

jjbayer Mar 3, 2023

jjbayer Mar 3, 2023

jan-auer Mar 1, 2023

jan-auer Mar 1, 2023

jjbayer Mar 7, 2023

		#[cfg(feature = "processing")]
		use crate::actors::processor::EnvelopeProcessor;

	#[cfg(feature = "processing")]
	if !was_rate_limited && config.processing_enabled() {
	// If there were no cached rate limits active, let the processor check redis:
	EnvelopeProcessor::from_registry().send(RateLimitFlushBuckets {
	bucket_limiter,
	partition_key,
	});

	return;
	}

		self.index.entry(key.key).or_default().insert(key);
		self.index.entry(key.sampling_key).or_default().insert(key);

	let mut keys = self.index.remove(partial_key).unwrap_or_default();
	let mut queue_keys = self.index.remove(partial_key).unwrap_or_default();


		// We return false if project is not cached or its state is invalid.
		// We return false if project is not cached or its state is invalid, true otherwise.

	// We return false if project is not cached or its state is invalid, true otherwise.
	// We return false if project is not cached or its state is invalid, true otherwise.
	// We only have to check `partial_key`, because we already know that the `project_key`s `state` is valid and loaded.

	self.handle_processing(state.clone(), sampling_state, envelope, envelope_context)
	self.handle_processing(state.clone(), sampling_state, envelope, envelope_context);

ref(cache): Move buffering of pending envelope to ProjectCache #1879

ref(cache): Move buffering of pending envelope to ProjectCache #1879

Conversation

olksdr commented Feb 24, 2023 • edited Loading

jjbayer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iker-barriocanal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjbayer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olksdr commented Feb 24, 2023 •

edited

Loading