Fail lock acquisition if service already holds one.#24782
Fail lock acquisition if service already holds one.#24782patrickmann merged 6 commits intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a defensive check to prevent the RefreshingLockService from accidentally overwriting an existing lock. When attempting to schedule a new lock, the service now validates that it doesn't already hold a lock, throwing an IllegalStateException if it does. This prevents potential lock management bugs in the cluster where an active lock could be lost.
Changes:
- Added a null check for existing locks before scheduling a new lock
- Throws
IllegalStateExceptionwith descriptive message if a lock is already held
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (lock != null) { | ||
| throw new IllegalStateException("Unable to acquire new lock, already holding lock that would get lost: " + lock); | ||
| } |
There was a problem hiding this comment.
The new lock validation logic should be covered by a test case that verifies the IllegalStateException is thrown when attempting to schedule a lock while already holding one.
|
@dennisoelkers I've opened a new pull request, #24830, to work on those changes. Once the pull request is ready, I'll request review from you. |
…tion (#24830) * Initial plan * Add tests for lock acquisition failure when already holding a lock Co-authored-by: dennisoelkers <41929+dennisoelkers@users.noreply.github.com> * Use @ExtendWith and extract mock setup to beforeEach Co-authored-by: dennisoelkers <41929+dennisoelkers@users.noreply.github.com> * Remove accidentally committed test output log Co-authored-by: dennisoelkers <41929+dennisoelkers@users.noreply.github.com> * Move locks to field initializers and mocks to test methods Co-authored-by: dennisoelkers <41929+dennisoelkers@users.noreply.github.com> * Remove test_output.log from .gitignore Co-authored-by: dennisoelkers <41929+dennisoelkers@users.noreply.github.com> * Use real single-threaded executor instead of mock Co-authored-by: dennisoelkers <41929+dennisoelkers@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: dennisoelkers <41929+dennisoelkers@users.noreply.github.com>
There was a problem hiding this comment.
The lock leak issue seems legit:
Bug: Lock leak when the guard triggers
The check is in the wrong place. By the time scheduleLock() is called, the new lock has already been acquired from lockService.lock(). When the IllegalStateException fires, the newly acquired lock is never released — it's leaked.
// In acquireAndKeepLock:
Optional<Lock> optionalLock = lockService.lock(resource, maxConcurrency); // Lock acquired!
if (optionalLock.isEmpty()) {
throw new AlreadyLockedException(...);
}
scheduleLock(optionalLock.get()); // IllegalStateException thrown here → lock leakedThe second lock is now held in the underlying LockService (e.g. in MongoDB) but nobody will ever release or refresh it. This could block other nodes from acquiring that resource.
Fix: Move the guard to the beginning of both acquireAndKeepLock methods, before calling lockService.lock():
public void acquireAndKeepLock(String resource, int maxConcurrency) throws AlreadyLockedException {
if (lock != null) {
throw new IllegalStateException("Unable to acquire new lock, already holding lock that would get lost: " + lock);
}
Optional<Lock> optionalLock = lockService.lock(resource, maxConcurrency);
// ...
}Or alternatively, catch the exception and release the new lock before rethrowing.
Test gap
The tests verify the exception is thrown but don't assert that the second lock is properly cleaned up (released). Given the bug above, adding such an assertion (e.g. verify(lockService).unlock(secondLock)) would actually fail and expose the leak.
Also, the mock setup for the second lock (when(lockService.lock(eq("second-resource"), ...))) is unnecessary if the check is moved before the lockService.lock() call, since it would never be reached.
Minor notes
- The
lockfield is notvolatileand access is unsynchronized — this is pre-existing and out of scope, but worth noting since the new guard readslockwithout synchronization.
…ling, removing now unnecessary mocks.
patrickmann
left a comment
There was a problem hiding this comment.
Looks good. Since this is a defensive fix there's no way to test it in action; unit tests are fine.
Description
Motivation and Context
The commit adds a safety check to prevent the
RefreshingLockServicefrom acquiring a new lock when it already holds an existing one. Before scheduling a new lock, the code now checks if the current lock is not null, and if so, throws anIllegalStateExceptionindicating that acquiring a new lock would cause the existing lock to be lost.This is a defensive programming improvement that prevents potential bugs where a service might accidentally overwrite an active lock, which could lead to lock management issues in the cluster.
/nocl Internal refactoring.
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: