- 
                Notifications
    You must be signed in to change notification settings 
- Fork 151
Add poison state to sandbox #931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces sandbox poisoning functionality to prevent further operations when a sandbox is in an inconsistent state that could compromise memory safety. The sandbox becomes poisoned when guest functions abort/panic or when host-initiated execution cancellation occurs.
Key changes:
- Added poisoned state tracking with safety checks across all sandbox operations
- Implemented automatic poison detection for specific error types (GuestAborted, ExecutionCanceledByHost)
- Added recovery mechanisms through snapshot restoration or manual poison clearing
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description | 
|---|---|
| src/hyperlight_host/src/sandbox/initialized_multi_use.rs | Core implementation of poison state tracking, safety checks, and recovery mechanisms | 
| src/hyperlight_host/src/error.rs | Added PoisonedSandbox error variant with detailed documentation | 
| src/hyperlight_host/tests/integration_test.rs | Updated interrupt tests to clear poison state for continued execution | 
| src/hyperlight_host/src/sandbox/snapshot.rs | Added Debug trait to Snapshot struct | 
| src/hyperlight_host/src/mem/shared_mem_snapshot.rs | Added Debug trait to SharedMemorySnapshot struct | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff!
Changes LGTM. I left some clarifying comments, but overall good to go.
61713f3    to
    74a9260      
    Compare
  
    | | HyperlightError::NoMemorySnapshot | ||
| | HyperlightError::ParameterValueConversionFailure(_, _) | ||
| | HyperlightError::PEFileProcessingFailure(_) | ||
| | HyperlightError::PoisonedSandbox | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit strange to me that IsPoisoned doesn't return true when this is the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right it's confusing, but it is correct. PoisonedSandbox is what we return in case some other error already have caused poison state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which reminds we should probably better organize our errors into categories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't by definition poisoned then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because this error doesn't cause poison. PoisionedSandbox is what we returned if it is poisoned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the sandbox returns this error then poisoned() == true?  something feels off about not verifying that if this is returned and the internal state should also be verified to be correct.
| /// | ||
| /// **Data Structure Corruption**: | ||
| /// - Hash tables with inconsistent bucket counts vs. actual entries | ||
| /// - Linked lists with broken next/prev pointers | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these things we have that get corrupted? These seems like general information on corruption but not specific to what we are protecting against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes these are just general information warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would all of these fall under memory corruption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this list feels overly generic and distracts from the rest of the comment but can see what others think
Signed-off-by: Ludvig Liljenberg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, a few nits but happy to see what other say
This PR adds a poison state to sandbox in order to prevent further operations when the sandbox is left in an inconsistent state that could compromise memory safety, data integrity, or security. The sandbox becomes poisoned when guest functions abort/panic or when host-initiated execution cancellation occurs, leaving behind leaked heap allocations, corrupted data structures, or unreleased resources. For example, interrupting execution while guest is allocating can leave the global allocation lock in an inconsistent state, making future allocations impossible in subsequent runs due to infinite locking/spinning.
Poisoned sandboxes will reject all further operations (guest calls, snapshots, memory mapping) until the inconsistent state is resolved through either restoring to a snapshot or manually (unsafely) clearing the poison state.
Closes #848