-
-
Notifications
You must be signed in to change notification settings - Fork 253
fix: shield coverage result polling #6847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@metamaskbot publish-preview |
@metamaskbot publish-preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note my comment on using a singular abort controller for unrelated result requests. I think it's highly problematic. You only want to cancel polling for that specific coverageId, not for all coverage requests.
const simulationDataChanged = this.#compareTransactionSimulationData( | ||
previousTransaction?.simulationData, | ||
transaction.simulationData, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bot must have had a bad day. That's not the case, as far as I can see.
pollInterval?: number; | ||
}, | ||
): Promise<GetCoverageResultResponse> { | ||
if (this.#abortController && !this.#abortController.signal.aborted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you integrated the abort functionality is problematic.
You use one abort controller for all requests.
There may be situations where you have result polling for different transactions and signatures at the same time. If one log request comes in, it cancels the polling from all other unrelated transactions and signatures. That's most definitely not what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the design with Map<transactionId | signatureId, AbortController
to handle multiple requests.
9225267
let shouldContinuePolling = true; | ||
const poll = async (): Promise<GetCoverageResultResponse> => { | ||
// Poll until the coverage result is ready or the abort signal is triggered. | ||
while (shouldContinuePolling && !abortController.signal.aborted) { | ||
const startTime = Date.now(); | ||
const res = await this.#fetch(configs.coverageResultUrl, { | ||
method: 'POST', | ||
headers, | ||
body: JSON.stringify(reqBody), | ||
signal: abortController.signal, | ||
}); | ||
if (res.status === 200) { | ||
shouldContinuePolling = false; | ||
return (await res.json()) as GetCoverageResultResponse; | ||
} | ||
if (!abortController.signal.aborted) { | ||
await sleep(pollInterval - (Date.now() - startTime)); | ||
} | ||
// The following line will not have an effect as the upper level promise | ||
// will already be rejected by now. | ||
throw new Error('unexpected error'); | ||
}; | ||
} | ||
// The following line will not have an effect as the upper level promise | ||
// will already be rejected by now. | ||
throw new Error('unexpected error'); | ||
}; | ||
|
||
poll().then(resolve).catch(reject); | ||
}); | ||
return await withTimeoutAndCancellation<GetCoverageResultResponse>( | ||
poll, | ||
timeout, | ||
abortController, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of rewriting the entire logic, you could have just used the existing variable timeoutReached
(maybe rename it) to cancel the polling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid refactoring existing logic, if not necessary. I suspect the logic might have been unclear and that's why you have rewritten it. Instead you could also check with the author to clarify the existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore the existing logic and appended the cancellation logic in this commit, 9225267
|
||
const abortHandler = () => { | ||
timeoutReached = true; | ||
this.#removeAbortHandler(configs.requestId, abortHandler); | ||
reject(new Error('Coverage result polling cancelled')); | ||
}; | ||
abortController.signal.addEventListener('abort', abortHandler); | ||
|
||
setTimeout(() => { | ||
timeoutReached = true; | ||
this.#removeAbortHandler(configs.requestId, abortHandler); | ||
reject(new Error('Timeout waiting for coverage result')); | ||
}, timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could remove some redundancy here. Would it be possible to have the cleanup and rejection in just one place? Maybe you could simplify call abortController.abort()
when the timeout triggers?
timeoutReached = true;
this.#removeAbortHandler(configs.requestId, abortHandler);
reject(new Error(msg));
setTimeout(() => { | ||
timeoutReached = true; | ||
this.#removeAbortHandler(configs.requestId, abortHandler); | ||
reject(new Error('Timeout waiting for coverage result')); | ||
}, timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are already using an AbortSignal above, you could also use an AbortSignal.timeout
here (instead of setTimeout
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please advise me if I'm wrong.
One problem with using AbortSignal.timeout
is that, we can't manually cancel the timeout.
Let's say when another request comes in, we need to cancel the pending requests right?
Cancelling request needs two things here
- cancel the abort controller (to stop polling)
- cancel the timeout
For the timeout cancel, if we are using JS timers, we can simply cancel the timers. However, there's no way to cancel the AbortSignal.timeout
, which will eventually throw which would be problematic, too.
method: 'POST', | ||
headers, | ||
body: JSON.stringify(reqBody), | ||
signal: abortController.signal, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not respect the timeout.
You can have a look at my draft, how the two can be combined:
timeoutSignal.onabort = () => abortController.abort(); |
Similarly, the while
condition does not respect the abortController.signal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this doesn't respect the timeout. But this cancel the ongoing network request, when the abortController is aborted (aka new request is triggered).
To be frank, this (AbortController with network request) is a very common pattern (especially in frontend/react).
Ref: https://stackoverflow.com/a/61056569
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, the while condition does not respect the abortController.signal.
When abortController.signal
is aborted, we set timeoutReached=true
where break the while
loop.
const abortController = this.#abortControllerMap.get(requestId); | ||
if (abortController) { | ||
if (abortHandler) { | ||
abortController.signal.removeEventListener('abort', abortHandler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering: Is it necessary to remove the handler, if we drop the whole controller anyways? (Might simplify a few things.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should clean up (removeEventListener
) whenever we do addEventListener
.
Eventually, the event listeners will be clean up when the event target (AbortController) is garbage collected. But still memory leaks can if AbortController is not GC, due to some unknown reasons :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any use of mocking the AbortController. On the downside, it bloats the codebase and can also cause some issues (e.g., it currently doesn't work with AbortSignal.any
). We should remove this mock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad on not removing the MockAbortController when it's no longer needed anymore.
I will take note of that, and will take a closer and more careful look in the future. 🙏
const coverageId = 'coverageId'; | ||
|
||
// Mock fetch responses | ||
fetchMock.mockImplementation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mock implementation is in parts redundant to the other fetchMock implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these new tests were refactored and removed here, 565c42c
const signatureRequest = generateMockSignatureRequest(); | ||
const coverageId = 'test'; | ||
|
||
fetchMock.mockImplementation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mock implementation is in parts redundant to the other fetchMock implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these new tests were refactored and removed here, 565c42c
}; | ||
} | ||
|
||
#cancelPreviousPendingRequests(id: string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#cancelPreviousPendingRequests(id: string) { | |
#cancelPendingRequests(id: string) { |
If it's "pending" request, it's always a "previous" request. So we can omit previous
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these logics were refactored and removed here, 565c42c
coverageId, | ||
}; | ||
|
||
const abortController = this.#assignNewAbortController(configs.requestId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you create this here and not within the Promise?
(We could probably move all things from the outer context into the Promise. Might be cleaner overall.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these logics were refactored and removed here, 565c42c
|
||
const abortHandler = () => { | ||
timeoutReached = true; | ||
this.#removeAbortHandler(configs.requestId, abortHandler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to here, but could we change configs
to config
? Configuration is usually used in singular. Alternatively, you could use opts
or options
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these logics were refactored and removed here, 565c42c
@metamaskbot publish-preview |
this.#pollingWithTimeout | ||
.pollRequest(config.requestId, requestCoverageFn, pollingOptions) | ||
.then(resolve) | ||
.catch(reject); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refactor makes everything much more readable. 👍
Now you can even drop the wrapping new Promise
and just write return await this.#pollingWithTimeout.pollRequest(...);
.
throw new Error(`${pollingOptions.fnName}: Request cancelled`); | ||
} | ||
} | ||
await this.#delay(pollInterval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is not canceled by the abort signal. So if we have a long polling interval, then it can take some time until this returns.
Previously this was achieved by rejecting immediately when the timeout occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the abort doesn't cancelled the delay between polling.
But I don't think this is a big issue, a small delay won't hurt so much. After the delay, the polling will break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a cancellation to the delay here, 27a1acb
try { | ||
const result = await requestFn(abortController.signal); | ||
// polling success, we just need to clean up the request entry and return the result | ||
this.#cleanUpOnFinished(requestId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have to call this cleanup function also in the catch
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, coz cleanUpOnFinished
will be called in the AbortHandler
and Timeout
handle, where we only want to clean up.
We don't want to clean in case of normal errors, we want the polling to continue.
this.abortPendingRequests(requestId); | ||
|
||
// insert the request entry for the next polling cycle | ||
const { abortController } = this.#insertRequestEntry(requestId, timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole functionality of this is a bit difficult to grasp as it's spread across the whole file and different handlers (especially the different cleanup handlers are confusing).
Would it be possible to streamline the logic a bit:
- Create 1 abort controller. If this aborts, the polling aborts and the abort handling is cleaned up.
- Attached the timeout event to the abort controller. (i.e., on timeout, call
controller.abort()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed it here, 27a1acb
@metamaskbot publish-preview |
Preview builds have been published. See these instructions for more information about preview builds. Expand for full list of packages and versions.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I really don't want to drag this out any longer, but I still don't get it.
Why do you have separate cleanup functions? Could it happen that one is called without the other? Would that be a problem? (e.g., could it happen that the event handler doesn't get removed?) Would it be possible to merge these two cleanup functions and just always call the same one?
Let's have a call to clear things up.
Update:
In fact, I think I found a case where things don't get cleaned up properly. If user calls abortPendingRequest
, then the entry is removed from map and timeout is cleared, but the abort event handler is not removed from the signal.
More concretely:
abortPendingRequest
calls abortController.abort
after the entry has already been removed. So when the abortHandler
triggers and the handler calls cleanUpOnFinished
, the entry is not found and the event handler is not removed.
Update2:
We had a call and noted the following:
- The current cleanup after
abortPendingRequest
indeed doesn't remove the event handler, which can potentially cause a memory leak. - The cleanup functions could be merged into one to simplify the cleanup process independent of how the signal abort is triggered.
- Unsolved issue: We noted that the current polling logic expects that everything is cleaned up before the next polling starts. However, the cleanup happens asynchronously, so we need to implement a logic that lets the polling logic await the finished cleanup.
TODO:
- Understand in detail if not cleaning up the event handler causes a memory leak. If that is not the case, then we don't have to worry about that.
- Merge cleanup logic into one function. Always use the same cleanup flow. (i.e., let the abort controller trigger the cleanup logic on abort)
- Implement functionality so that we can await the cleanup to be finished before we start the polling from anew.
Bonus:
Since we are spending quite a bit of time on this, and getting the logic right appears non-trivial, we might want to revisit a suggestion made by @mcmire earlier to use the Cockatiel library. #6137 (comment)
} | ||
|
||
/** | ||
* Abort the pending requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
* Abort the pending requests. | |
* Abort the pending request. |
because you changed the function name
Just curious, why did you rename this function from abortPendingRequests
to abortPendingRequest
?
// firstly clean up the request entry if it exists | ||
// note: this does not abort the request, it only cleans up the request entry for the next polling cycle | ||
const existingEntry = this.#cleanUpRequestEntryIfExists(requestId); | ||
// then abort the request if it exists | ||
// note: this does abort the request, but it will not trigger the abort handler (hence, {@link cleanUpRequestEntryIfExists} will not be called) | ||
// coz the AbortHandler event listener is already removed from the AbortSignal | ||
existingEntry?.abortController.abort(this.ABORT_REASON_CANCELLED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I still don't get it. Why don't you simply call abortController.abort
here? Wouldn't that be much simpler?
// firstly clean up the request entry if it exists | |
// note: this does not abort the request, it only cleans up the request entry for the next polling cycle | |
const existingEntry = this.#cleanUpRequestEntryIfExists(requestId); | |
// then abort the request if it exists | |
// note: this does abort the request, but it will not trigger the abort handler (hence, {@link cleanUpRequestEntryIfExists} will not be called) | |
// coz the AbortHandler event listener is already removed from the AbortSignal | |
existingEntry?.abortController.abort(this.ABORT_REASON_CANCELLED); | |
const requestEntry = this.#requestEntries.get(requestId); | |
requestEntry?.abortController.abort(this.ABORT_REASON_CANCELLED); |
Could there be a case where existingEntry
is undefined here and the controller doesn't get aborted?
|
||
const abortHandlerForDelay = () => { | ||
// clear the timeout and resolve the promise | ||
// Note: we don't reject the promise as this is only a dummy delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite unusual. The convention is that if a signal gets aborted, then the function throws an error, afaik. Why did you decide to do it differently here? I usually suggest to stick to conventions because otherwise people will have a hard time understanding the code.
* | ||
* @param ms - The number of milliseconds to delay. | ||
* @param abortSignal - The abort signal to listen to. | ||
* @returns A promise that resolves when the delay is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @returns A promise that resolves when the delay is complete. | |
* @returns A promise that resolves when the delay is complete or when the signal is aborted. |
#cleanUpOnFinished(requestId: string): RequestEntry | undefined { | ||
const requestEntry = this.#cleanUpRequestEntryIfExists(requestId); | ||
if (requestEntry) { | ||
requestEntry.abortController.signal.removeEventListener( | ||
'abort', | ||
requestEntry.abortHandler, | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the bot is right here, but I have a related question:
Why do you have two cleanup functions, where one only cleans up part of it (except for the handler)?
Couldn't we use just one cleanup function in all cases?
resolve(undefined); | ||
} | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
if (abortSignal.aborted) { | ||
reject(new Error(this.ABORT_REASON_CANCELLED)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
82bedfc
to
67ee8dd
Compare
// TODO: remove | ||
#unregisterListeners(disposableListeners: { dispose: () => void }[]) { | ||
disposableListeners.forEach((disposable) => disposable.dispose()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abortSignal.addEventListener('abort', abortHandlerForDelay, { | ||
once: true, // only listen to the abort event once | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Abort Handling Fails, Leading to Resource Leaks
In PollingWithTimeoutAndAbort
's #delayWithAbortSignal
, an early abort signal causes the method to reject but continue execution, setting up unnecessary timers and event listeners, which can lead to resource leaks. Additionally, the PollingWithCockatielPolicy
test for request abortion has a mock requestFn
that unconditionally rejects, bypassing the abortSignal.aborted
check and causing incorrect test behavior.
Explanation
This PR includes ~
TransactionController:stateChange
subscriber to avoid triggering multiple check coverage result unnecessarily.Personal Sign
check from the check signature coverage result.References
Checklist
Note
Make coverage-result polling abortable with timeouts and request-level cancellation, refine transaction simulation-change detection, remove PersonalSign-only check, add new polling utilities/policies and tests, and update deps.
src/backend.ts
):PollingWithTimeoutAndAbort
; add request-level abort, configurable timeouts/intervals, and cancellation before new polling/logging.requestId
to result polling; update timeout error togetCoverageResult: Request timed out
; removesleep
helper.src/ShieldController.ts
):TransactionMeta.simulationData
via#compareTransactionSimulationData
to avoid redundant checks.PersonalSign
restriction; check signature coverage for any new signature request.polling-with-timeout-abort
andpolling-with-policy
(Cockatiel-based) with comprehensive tests.afterEach
cleanup.TX_META_SIMULATION_DATA_MOCKS
to cover simulation-data change scenarios.@metamask/controller-utils
andcockatiel
.Personal Sign
check.Written by Cursor Bugbot for commit 67ee8dd. This will update automatically on new commits. Configure here.