Solving race condition when context times out. #100

girino · 2025-10-05T00:09:41Z

This solves issue #99 . (used chatGPT to help me explain the pull request)

Why Goroutines Kept Running After Timeout

When the context times out, the main routine continues execution, but the goroutine keeps running in the background.
Each new iteration spawns another goroutine, so multiple background goroutines end up modifying the shared pubkeyFollowerCount map at the same time.
This leads to a race condition that manifests later in updateWoTMap().

Problem Explanation

go func() {
    defer cancel()
    events := pool.FetchMany(timeoutCtx, config.ImportSeedRelays, filter)
    for ev := range events {
        pubkeyFollowerCount[contact[1]]++  // shared map write
    }
    close(done)
}()

select {
case <-done:
    log.Println("🕸️ analysed", nPubkeys)
case <-timeoutCtx.Done():
    log.Println("🚫 timeout while fetching pubkeys")
}

Step by step:

The goroutine starts fetching and processing events.
If timeoutCtx expires first, the select unblocks and the main routine continues to the next iteration.
The goroutine does not stop — it keeps running because nothing waits for or cancels it. (there is no check for context timeout inside the goroutine)
Each new iteration launches another goroutine that also writes to pubkeyFollowerCount.
When updateWoTMap() runs, previous goroutines may still be writing → data race.

Fix

Replace the goroutine with an immediately invoked synchronous function:

func() {
    events := pool.FetchMany(timeoutCtx, config.ImportSeedRelays, filter)
    for ev := range events {
        pubkeyFollowerCount[contact[1]]++
    }
}()

Why Running Synchronously Is Safe

Running the function synchronously does not harm performance or behavior because:

The main routine was already waiting on the done channel before proceeding, so the code effectively behaved synchronously already — just with unnecessary concurrency and potential races.
Timeouts are still enforced by FetchMany, which respects timeoutCtx. If the timeout triggers, the function returns promptly.
Post-fetch processing is lightweight, running much faster than FetchMany itself, so executing it inline adds negligible latency.

In short, removing the goroutine eliminates background writes without changing timing or responsiveness — the main routine still respects the timeout, and execution remains efficient and deterministic.

aaccioly · 2025-10-06T00:49:36Z

Hi @girino, many thanks for your contribution. You're absolutely correct about the concurrency issue here. The story behind this Go routine is that, regardless of context timeouts, in certain corner cases Haven was getting stuck (to be fair, Haven was using FetchManyReplaceable), which is more complex and less battle-tested than FetchMany.

Also, even when FetchMany does return on cancel, it often silently skips events, so we need to alert users when that happens.

My take is that we’ll eventually need to bite the bullet and tackle potential race conditions in the WoT map. This makes even more sense considering your feature request about dynamically updating the WoT. We’d need to not only check if the context is done as part of the write loop but also consider using sync.Map, mutexes, etc.

That said, IMO these are bigger changes for a v2. For now, I’m happy to merge these changes. I’d just like to ask you to:

Remove the self-invoking function unless it’s actually needed in the synchronous flow for whatever reason.
When a timeout has been reached, log a message warning the user that some events may have been skipped (as before).
Remove the comments in lines 53–56, as they won’t make sense once the Go routine itself is gone.

girino · 2025-10-06T02:30:43Z

1- the function is not needed per se, but it helps cancel the context properly independently of the exit case (with defer cancel()).
2 and 3 I'll probably only have time to work on them next weekend.

aaccioly · 2025-10-06T09:54:41Z

No problem and no rush (it is not like this race condition is new 🤣). If you want to grant me access I can also try to have a look. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork

aaccioly · 2025-10-26T22:05:28Z

Hey @girino , just double checking if you are still planning to implement the changes above? Otherwise let me know. I'm doing some changes to import myself and am happy to take this over.

girino · 2025-10-27T02:42:02Z

i probably wont have free time to do this in the near future. you can take it over or just ignore it if you wish.

solving race condition when context times out.

8aaa297

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Solving race condition when context times out. #100

Solving race condition when context times out. #100

Uh oh!

girino commented Oct 5, 2025

Uh oh!

aaccioly commented Oct 6, 2025

Uh oh!

girino commented Oct 6, 2025

Uh oh!

aaccioly commented Oct 6, 2025

Uh oh!

aaccioly commented Oct 26, 2025

Uh oh!

girino commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Solving race condition when context times out. #100

Are you sure you want to change the base?

Solving race condition when context times out. #100

Uh oh!

Conversation

girino commented Oct 5, 2025

Why Goroutines Kept Running After Timeout

Problem Explanation

Fix

Why Running Synchronously Is Safe

Uh oh!

aaccioly commented Oct 6, 2025

Uh oh!

girino commented Oct 6, 2025

Uh oh!

aaccioly commented Oct 6, 2025

Uh oh!

aaccioly commented Oct 26, 2025

Uh oh!

girino commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants