-
-
Notifications
You must be signed in to change notification settings - Fork 463
fix(hub,cli): four hub-restart-cascade cleanup bugs (#913 #914 #916 #919) #923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
heavygee
wants to merge
5
commits into
tiann:main
Choose a base branch
from
heavygee:fix/hub-restart-cleanup-bundle
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 3 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
928b1b6
fix(hub,cli): four hub-restart-cascade cleanup bugs (#913 #914 #916 #…
heavygee 1c8972a
fix(cli): runner-spawned children use 'Stopped by runner' as default …
heavygee 123c625
fix(hub): markSessionArchivedFromHub surfaces persistence failures as…
heavygee a97b9dc
revert(cli): drop HAPI_DEFAULT_ARCHIVE_REASON env override
heavygee 98b1031
fix(cli): clean completions get 'Session completed', not 'Hub restart'
heavygee File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,127 @@ | ||
| import { describe, expect, it, vi } from 'vitest' | ||
| import { createRunnerLifecycle } from './runnerLifecycle' | ||
|
|
||
| // tiann/hapi#914: the runnerLifecycle's default archiveReason is now | ||
| // 'Hub restart' (was 'User terminated'). Out-of-band SIGTERM from the | ||
| // hub-restart cascade keeps that default. Explicit user actions | ||
| // (clicking Archive in the web UI, Ctrl-C in a local terminal, | ||
| // uncaught exception) reassign the reason before archive metadata is | ||
| // written. | ||
|
|
||
| function makeFakeSession() { | ||
| const metadataWrites: Array<Record<string, unknown>> = [] | ||
| return { | ||
| updateMetadata: vi.fn((handler: (m: Record<string, unknown>) => Record<string, unknown>) => { | ||
| const next = handler({}) | ||
| metadataWrites.push(next) | ||
| return next | ||
| }), | ||
| sendSessionDeath: vi.fn(), | ||
| flush: vi.fn(async () => {}), | ||
| close: vi.fn(async () => {}), | ||
| metadataWrites | ||
| } | ||
| } | ||
|
|
||
| describe('createRunnerLifecycle archiveReason defaults (tiann/hapi#914)', () => { | ||
| it('uses Hub restart as the default archiveReason when no override is applied', async () => { | ||
| const session = makeFakeSession() | ||
| const lifecycle = createRunnerLifecycle({ | ||
| session: session as unknown as Parameters<typeof createRunnerLifecycle>[0]['session'], | ||
| logTag: 'test' | ||
| }) | ||
|
|
||
| await lifecycle.cleanup() | ||
|
|
||
| expect(session.metadataWrites).toHaveLength(1) | ||
| expect(session.metadataWrites[0]).toMatchObject({ | ||
| lifecycleState: 'archived', | ||
| archivedBy: 'cli', | ||
| archiveReason: 'Hub restart' | ||
| }) | ||
| }) | ||
|
|
||
| it('writes the operator-supplied reason when setArchiveReason is called (e.g. KillSession RPC)', async () => { | ||
| const session = makeFakeSession() | ||
| const lifecycle = createRunnerLifecycle({ | ||
| session: session as unknown as Parameters<typeof createRunnerLifecycle>[0]['session'], | ||
| logTag: 'test' | ||
| }) | ||
|
|
||
| lifecycle.setArchiveReason('User terminated') | ||
| await lifecycle.cleanup() | ||
|
|
||
| expect(session.metadataWrites[0]).toMatchObject({ | ||
| archiveReason: 'User terminated' | ||
| }) | ||
| }) | ||
|
|
||
| it('markCrash overrides the default reason to "Session crashed"', async () => { | ||
| const session = makeFakeSession() | ||
| const lifecycle = createRunnerLifecycle({ | ||
| session: session as unknown as Parameters<typeof createRunnerLifecycle>[0]['session'], | ||
| logTag: 'test' | ||
| }) | ||
|
|
||
| lifecycle.markCrash(new Error('boom')) | ||
| await lifecycle.cleanup() | ||
|
|
||
| expect(session.metadataWrites[0]).toMatchObject({ | ||
| archiveReason: 'Session crashed' | ||
| }) | ||
| }) | ||
|
|
||
| // tiann/hapi#914 review feedback: runner-spawned children should not | ||
| // get the 'Hub restart' label when the runner SIGTERMs them via | ||
| // stop-session, webhook-timeout cleanup, or orphan cleanup. The runner | ||
| // sets HAPI_DEFAULT_ARCHIVE_REASON on the child's spawn env so the | ||
| // default reason is path-accurate without changing every kill site. | ||
| it('honours HAPI_DEFAULT_ARCHIVE_REASON env as the default reason', async () => { | ||
| const session = makeFakeSession() | ||
| const original = process.env.HAPI_DEFAULT_ARCHIVE_REASON | ||
| process.env.HAPI_DEFAULT_ARCHIVE_REASON = 'Stopped by runner' | ||
| try { | ||
| const lifecycle = createRunnerLifecycle({ | ||
| session: session as unknown as Parameters<typeof createRunnerLifecycle>[0]['session'], | ||
| logTag: 'test' | ||
| }) | ||
|
|
||
| await lifecycle.cleanup() | ||
|
|
||
| expect(session.metadataWrites[0]).toMatchObject({ | ||
| archiveReason: 'Stopped by runner' | ||
| }) | ||
| } finally { | ||
| if (original === undefined) { | ||
| delete process.env.HAPI_DEFAULT_ARCHIVE_REASON | ||
| } else { | ||
| process.env.HAPI_DEFAULT_ARCHIVE_REASON = original | ||
| } | ||
| } | ||
| }) | ||
|
|
||
| it('setArchiveReason still wins over HAPI_DEFAULT_ARCHIVE_REASON', async () => { | ||
| const session = makeFakeSession() | ||
| const original = process.env.HAPI_DEFAULT_ARCHIVE_REASON | ||
| process.env.HAPI_DEFAULT_ARCHIVE_REASON = 'Stopped by runner' | ||
| try { | ||
| const lifecycle = createRunnerLifecycle({ | ||
| session: session as unknown as Parameters<typeof createRunnerLifecycle>[0]['session'], | ||
| logTag: 'test' | ||
| }) | ||
|
|
||
| lifecycle.setArchiveReason('User terminated') | ||
| await lifecycle.cleanup() | ||
|
|
||
| expect(session.metadataWrites[0]).toMatchObject({ | ||
| archiveReason: 'User terminated' | ||
| }) | ||
| } finally { | ||
| if (original === undefined) { | ||
| delete process.env.HAPI_DEFAULT_ARCHIVE_REASON | ||
| } else { | ||
| process.env.HAPI_DEFAULT_ARCHIVE_REASON = original | ||
| } | ||
| } | ||
| }) | ||
| }) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| import { describe, expect, it, vi } from 'vitest' | ||
| import { RPC_METHODS } from '@hapi/protocol/rpcMethods' | ||
| import { registerKillSessionHandler } from './registerKillSessionHandler' | ||
|
|
||
| // tiann/hapi#914: the KillSession RPC is the authoritative "user-terminated" | ||
| // signal because the hub only sends it when the operator clicks Archive in | ||
| // the web UI. Out-of-band SIGTERM (hub-restart cascade, host-level `kill`) | ||
| // hits the SIGTERM signal handler in runnerLifecycle, which now keeps the | ||
| // default reason 'Hub restart' so the audit trail stays correct. | ||
| describe('registerKillSessionHandler (tiann/hapi#914)', () => { | ||
| function makeRegistry() { | ||
| const handlers = new Map<string, (params?: unknown) => unknown>() | ||
| return { | ||
| registerHandler: (method: string, handler: (params: unknown) => unknown) => { | ||
| handlers.set(method, handler as (params?: unknown) => unknown) | ||
| }, | ||
| handlers | ||
| } | ||
| } | ||
|
|
||
| it('stamps archiveReason=User terminated before triggering cleanupAndExit', async () => { | ||
| const registry = makeRegistry() | ||
| const lifecycle = { | ||
| setArchiveReason: vi.fn(), | ||
| cleanupAndExit: vi.fn(async () => {}) | ||
| } | ||
|
|
||
| registerKillSessionHandler( | ||
| registry as unknown as Parameters<typeof registerKillSessionHandler>[0], | ||
| lifecycle | ||
| ) | ||
|
|
||
| const handler = registry.handlers.get(RPC_METHODS.KillSession) | ||
| expect(handler).toBeDefined() | ||
|
|
||
| const result = await handler?.() | ||
| expect(result).toEqual({ success: true, message: 'Killing hapi CLI process' }) | ||
|
|
||
| // setArchiveReason MUST be called BEFORE cleanupAndExit so the archive | ||
| // metadata write reads the correct reason. | ||
| const setReasonOrder = lifecycle.setArchiveReason.mock.invocationCallOrder[0] | ||
| const cleanupOrder = lifecycle.cleanupAndExit.mock.invocationCallOrder[0] | ||
| expect(setReasonOrder).toBeLessThan(cleanupOrder) | ||
| expect(lifecycle.setArchiveReason).toHaveBeenCalledWith('User terminated') | ||
| expect(lifecycle.cleanupAndExit).toHaveBeenCalled() | ||
| }) | ||
|
|
||
| it('still works with the legacy `(cleanupAndExit: () => Promise<void>)` call shape', async () => { | ||
| // Back-compat: runAgentSession.ts passes a bare closure as the second | ||
| // argument instead of a lifecycle object. The handler should not crash | ||
| // when setArchiveReason is absent. | ||
| const registry = makeRegistry() | ||
| const cleanupAndExit = vi.fn(async () => {}) | ||
|
|
||
| registerKillSessionHandler( | ||
| registry as unknown as Parameters<typeof registerKillSessionHandler>[0], | ||
| cleanupAndExit | ||
| ) | ||
|
|
||
| const handler = registry.handlers.get(RPC_METHODS.KillSession) | ||
| await handler?.() | ||
|
|
||
| expect(cleanupAndExit).toHaveBeenCalled() | ||
| }) | ||
| }) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Major] Runner-driven stop paths now get mislabeled as hub restarts. The new default makes every SIGTERM archive as
Hub restartunless a caller stamps a different reason first. Web Archive now does that throughregisterKillSessionHandler, buthapi runner stop-session, runner webhook timeout cleanup, and orphan cleanup terminate child sessions with SIGTERM directly (cli/src/runner/run.ts:267,cli/src/runner/run.ts:587). Those are operator/runner actions, not hub restarts, yet this SIGTERM handler keeps the new default, so archived metadata becomes misleading and the audit-trail fix regresses another supported termination path.Suggested fix: