fix: read exactly M values from channel by sbackend123 · Pull Request #5447 · ethersphere/bee

sbackend123 · 2026-04-26T18:47:19Z

Checklist

I have read the coding guide.
My change requires a documentation update, and I have done it.
I have added tests to cover my changes.
I have filled out the description and linked the related issues.

Description

Read exactly M values from channel instead of reading "forever", which leads to deadlock in really seldom cases, which happen in mostly in CI

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

runStrategy assumes that while consuming results from c, one of the two exit conditions will always become true before reads are exhausted. Because of that assumption, the previous loop used for range c (which only terminates when c is closed).

Why this can deadlock
In the DATA -> RACE fallback path, some DATA-owned shard fetches may still be finishing while RACE starts.
RACE can include data shard indices whose waits[i] is not closed yet; for those indices, fetch takes the fly=false path and waits on waits[i].
For a successful DATA fetch, the operation order is:

close(g.waits[i])       // 1) unblocks waiting fly=false RACE goroutine
g.fetchedCnt.Add(1)     // 2) increments success counter

There is a small window between (1) and (2):

the waiting RACE goroutine can unblock and push its result into c,
runStrategy can consume that last value from c,
but fetchedCnt may still be stale for that check.

If this happens on the last available message in c, neither exit condition may trigger in that iteration, and the next read blocks forever (no more writers, channel not closed).

Related Issue (Optional)

Screenshots (if appropriate):

AI Disclosure

This PR contains code that has been generated by an LLM.
I have reviewed the AI generated code thoroughly.
I possess the technical expertise to responsibly review the code generated in this PR.

nugaon · 2026-05-05T19:53:38Z

I fixed this problem accidentally as well with #5449 but that solves more, because on DATA strategy early return, the subsequent RACE strategy will not deal with stale counters. the c should be drained fully.

Edit: though it makes sense to give a dedicated error return at the end of the block, even if the assumption should be correct now. that return never should be reached anyway and a named error could indicate it.

acud · 2026-05-05T23:05:41Z

@nugaon I don't fully understand why the buffered channel is needed in the first place. This is an anti-pattern in go and is against our coding guidelines: channel size should be one or none. If we would respect context cancellation and not have these sort of returns that forever block on a channel (aka: have a select statement) we would not need this channel draining logic which I honestly find confusing.

nugaon · 2026-05-06T08:36:32Z

on fan-in cases like this where we have exact number of go processes, creating the channel with the size of the routines could be considerable because it does not block at all.

we could make it with waitgroups and wait for all processes to return.
using that is just an additional primitive to achieve this, making a bit more complex.

fix: read exactly M values from channel

cb5785e

sbackend123 marked this pull request as ready for review April 26, 2026 18:47

acud requested changes Apr 27, 2026

View reviewed changes

Comment thread pkg/file/redundancy/getter/getter.go

acud approved these changes Apr 27, 2026

View reviewed changes

martinconic approved these changes Apr 28, 2026

View reviewed changes

akrem-chabchoub approved these changes Apr 28, 2026

View reviewed changes

gacevicljubisa approved these changes Apr 30, 2026

View reviewed changes

acud mentioned this pull request May 5, 2026

fix: draining strategy fetch #5449

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: read exactly M values from channel#5447

fix: read exactly M values from channel#5447
sbackend123 wants to merge 1 commit intomasterfrom
fix/testGetterRACE

sbackend123 commented Apr 26, 2026

Uh oh!

Uh oh!

nugaon commented May 5, 2026 •

edited

Loading

Uh oh!

acud commented May 5, 2026

Uh oh!

nugaon commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

sbackend123 commented Apr 26, 2026

Checklist

Description

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

AI Disclosure

Uh oh!

Uh oh!

nugaon commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acud commented May 5, 2026

Uh oh!

nugaon commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

nugaon commented May 5, 2026 •

edited

Loading