Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: support concurrent read/write from same active segment. #45

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ankur-anand
Copy link

Currently, wal.Read supports concurrent reads without data races only when using the exact chunk position, as it assumes that WAL is actively being written.

wal/wal.go

Line 426 in 9f1a618

func (wal *WAL) Read(pos *ChunkPosition) ([]byte, error) {

However, if parallel/concurrent log replay is needed, the active segment modifies certain fields in the segment struct during writes. These same fields are also accessed by reader instances, leading to potential data races:

currentBlockNumber uint32
currentBlockSize   uint32
closed             bool

To enable safe concurrent reads while the WAL is being written, this PR updates these fields to use atomic operations:

currentBlockNumber atomic.Uint32
currentBlockSize   atomic.Uint32
closed             atomic.Bool

By making these changes, the WAL will support parallel log replay from different readers without data races, ensuring consistency and thread safety.

Changes in this PR
• Replaced uint32 and bool fields with their atomic counterparts.
• Ensured all accesses to these fields use atomic operations.

@ankur-anand
Copy link
Author

ankur-anand commented Feb 7, 2025

Some of the concurrent race conditions that were observed.

==================
WARNING: DATA RACE
Write at 0x00c00003e978 by goroutine 54:
  github.com/rosedblabs/wal.(*segment).Close()
      /Users/ankur/go/pkg/mod/github.com/rosedblabs/[email protected]/segment.go:172 +0x448
  github.com/rosedblabs/wal.(*WAL).Close()
      /Users/ankur/go/pkg/mod/github.com/rosedblabs/[email protected]/wal.go:442 +0x40c
      
      
Previous read at 0x00c00003e978 by goroutine 80:
  github.com/rosedblabs/wal.(*segmentReader).Next()
      /Users/ankur/go/pkg/mod/github.com/rosedblabs/[email protected]/segment.go:481 +0x48
  github.com/rosedblabs/wal.(*Reader).Next()
      /Users/ankur/go/pkg/mod/github.com/rosedblabs/[email protected]/wal.go:262 +0xb4
==================
WARNING: DATA RACE
Write at 0x00c00003e978 by goroutine 54:
github.com/rosedblabs/wal.(*segment).Close()
   /Users/ankur/go/pkg/mod/github.com/rosedblabs/[email protected]/segment.go:172 +0x448
github.com/rosedblabs/wal.(*WAL).Close()
   /Users/ankur/go/pkg/mod/github.com/rosedblabs/[email protected]/wal.go:442 +0x40c
  
Previous read at 0x00c00003e978 by goroutine 80:
github.com/rosedblabs/wal.(*segmentReader).Next()
   /Users/ankur/go/pkg/mod/github.com/rosedblabs/[email protected]/segment.go:481 +0x48
github.com/rosedblabs/wal.(*Reader).Next()

I agree that the closed variable data race,part will not happen until Wal.Close is called concurrently without Releasing the reader, but should be protected from a data-race condition.

@ankur-anand
Copy link
Author

\cc @roseduan

@roseduan
Copy link
Contributor

\cc @roseduan

Thanks, I will recheck this later.

@XiXi-2024
Copy link

I understand the purpose of this PR:

  1. When WAL instance executes NewReader, it adds the active segment file to Readers without setting a new active segment file
  2. Since Next is not locked, it may concurrently execute readInternal method on the active segment file
  3. The readInternal method accesses fields like currentBlockNumber, currentBlockSize, and closed, while Write method may modify these fields simultaneously, potentially causing data races

@XiXi-2024
Copy link

I believe changing to atomic types won't solve the concurrency issues. For the closed field, it may still lead to reading from a closed file, and since currentBlockNumber and currentBlockSize fields need to be updated synchronously, concurrent access would still cause race conditions.

@ankur-anand
Copy link
Author

@XiXi-2024

I believe changing to atomic types won't solve the concurrency issues. For the closed field, it may still lead to reading from a closed file, and since currentBlockNumber and currentBlockSize fields need to be updated synchronously, concurrent access would still cause race conditions.

You're right. But all the descriptors were closed, only when Wal is closed. So most of reader should be closed by the application owner before calling the wal.Close.

But then again this leaves the library not dealing with it, which should be avoided. I've updated the PR to cover those cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants