Skip to content

Conversation

@girino
Copy link
Contributor

@girino girino commented Oct 5, 2025

This solves issue #99 . (used chatGPT to help me explain the pull request)

Why Goroutines Kept Running After Timeout

When the context times out, the main routine continues execution, but the goroutine keeps running in the background.
Each new iteration spawns another goroutine, so multiple background goroutines end up modifying the shared pubkeyFollowerCount map at the same time.
This leads to a race condition that manifests later in updateWoTMap().

Problem Explanation

go func() {
    defer cancel()
    events := pool.FetchMany(timeoutCtx, config.ImportSeedRelays, filter)
    for ev := range events {
        pubkeyFollowerCount[contact[1]]++  // shared map write
    }
    close(done)
}()

select {
case <-done:
    log.Println("🕸️ analysed", nPubkeys)
case <-timeoutCtx.Done():
    log.Println("🚫 timeout while fetching pubkeys")
}

Step by step:

  1. The goroutine starts fetching and processing events.
  2. If timeoutCtx expires first, the select unblocks and the main routine continues to the next iteration.
  3. The goroutine does not stop — it keeps running because nothing waits for or cancels it. (there is no check for context timeout inside the goroutine)
  4. Each new iteration launches another goroutine that also writes to pubkeyFollowerCount.
  5. When updateWoTMap() runs, previous goroutines may still be writing → data race.

Fix

Replace the goroutine with an immediately invoked synchronous function:

func() {
    events := pool.FetchMany(timeoutCtx, config.ImportSeedRelays, filter)
    for ev := range events {
        pubkeyFollowerCount[contact[1]]++
    }
}()

Why Running Synchronously Is Safe

Running the function synchronously does not harm performance or behavior because:

  1. The main routine was already waiting on the done channel before proceeding, so the code effectively behaved synchronously already — just with unnecessary concurrency and potential races.
  2. Timeouts are still enforced by FetchMany, which respects timeoutCtx. If the timeout triggers, the function returns promptly.
  3. Post-fetch processing is lightweight, running much faster than FetchMany itself, so executing it inline adds negligible latency.

In short, removing the goroutine eliminates background writes without changing timing or responsiveness — the main routine still respects the timeout, and execution remains efficient and deterministic.

@aaccioly
Copy link
Collaborator

aaccioly commented Oct 6, 2025

Hi @girino, many thanks for your contribution. You're absolutely correct about the concurrency issue here. The story behind this Go routine is that, regardless of context timeouts, in certain corner cases Haven was getting stuck (to be fair, Haven was using FetchManyReplaceable), which is more complex and less battle-tested than FetchMany.

Also, even when FetchMany does return on cancel, it often silently skips events, so we need to alert users when that happens.

My take is that we’ll eventually need to bite the bullet and tackle potential race conditions in the WoT map. This makes even more sense considering your feature request about dynamically updating the WoT. We’d need to not only check if the context is done as part of the write loop but also consider using sync.Map, mutexes, etc.

That said, IMO these are bigger changes for a v2. For now, I’m happy to merge these changes. I’d just like to ask you to:

  1. Remove the self-invoking function unless it’s actually needed in the synchronous flow for whatever reason.
  2. When a timeout has been reached, log a message warning the user that some events may have been skipped (as before).
  3. Remove the comments in lines 53–56, as they won’t make sense once the Go routine itself is gone.

@girino
Copy link
Contributor Author

girino commented Oct 6, 2025

1- the function is not needed per se, but it helps cancel the context properly independently of the exit case (with defer cancel()).
2 and 3 I'll probably only have time to work on them next weekend.

@aaccioly
Copy link
Collaborator

aaccioly commented Oct 6, 2025

No problem and no rush (it is not like this race condition is new 🤣). If you want to grant me access I can also try to have a look. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork

@aaccioly
Copy link
Collaborator

Hey @girino , just double checking if you are still planning to implement the changes above? Otherwise let me know. I'm doing some changes to import myself and am happy to take this over.

@girino
Copy link
Contributor Author

girino commented Oct 27, 2025

i probably wont have free time to do this in the near future. you can take it over or just ignore it if you wish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants