Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

Implement retry logic for all Lambda Labs API calls

Summary

This PR extends the existing retry logic pattern from GetInstanceTypes to all other Lambda Labs API calls. Previously, only the GetInstanceTypes method had retry logic implemented, leaving other critical operations like instance creation, termination, and management vulnerable to transient network issues or API rate limits.

Changes made:

  • Added retry logic to 6 Lambda Labs API calls: AddSSHKey, LaunchInstance, GetInstance, TerminateInstance, ListInstances, and RestartInstance
  • Created individual retry-wrapped helper functions for each API call following the exact same pattern as getInstanceTypes
  • Used existing retry infrastructure: collections.RetryWithDataAndAttemptCount, getBackoff(), and handleAPIError()
  • Added proper lint annotations for deferred response body cleanup
  • Added missing collections import

Files modified:

  • internal/lambdalabs/v1/instance.go - Main implementation with 6 new retry-wrapped functions

Review & Testing Checklist for Human

  • Verify retry pattern consistency - Compare new retry functions against existing getInstanceTypes implementation to ensure identical patterns
  • Test end-to-end instance operations - Create, get, list, terminate, and reboot instances to verify all operations work correctly with retry logic
  • Validate error handling behavior - Ensure error types and messages haven't changed unexpectedly due to consistent handleAPIError usage
  • Check SSH key operations - Test instance creation with public keys to verify AddSSHKey retry logic works properly
  • Monitor for performance issues - Verify no infinite retry loops or excessive retry attempts that could impact performance

Recommended test plan: Create a test instance, perform various operations (reboot, get status), then terminate it while monitoring for any unexpected errors or timeouts.


Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    subgraph "Lambda Labs Provider"
        Instance["internal/lambdalabs/v1/<br/>instance.go"]:::major-edit
        InstanceType["internal/lambdalabs/v1/<br/>instancetype.go"]:::context
        Client["internal/lambdalabs/v1/<br/>client.go"]:::context
        Errors["internal/lambdalabs/v1/<br/>errors.go"]:::context
    end
    
    subgraph "Retry Infrastructure"
        Collections["internal/collections/<br/>RetryWithDataAndAttemptCount"]:::context
        Backoff["getBackoff()"]:::context
        ErrorHandler["handleAPIError()"]:::context
    end
    
    subgraph "API Operations"
        CreateOp["CreateInstance<br/>(LaunchInstance + AddSSHKey)"]:::major-edit
        GetOp["GetInstance"]:::major-edit
        ListOp["ListInstances"]:::major-edit
        TerminateOp["TerminateInstance"]:::major-edit
        RebootOp["RebootInstance<br/>(RestartInstance)"]:::major-edit
        GetTypesOp["GetInstanceTypes<br/>(existing pattern)"]:::context
    end
    
    Instance --> CreateOp
    Instance --> GetOp
    Instance --> ListOp
    Instance --> TerminateOp
    Instance --> RebootOp
    
    CreateOp --> Collections
    GetOp --> Collections
    ListOp --> Collections
    TerminateOp --> Collections
    RebootOp --> Collections
    GetTypesOp --> Collections
    
    Collections --> Backoff
    Collections --> ErrorHandler
    
    Client --> Backoff
    Errors --> ErrorHandler
    
    subgraph Legend
        L1["Major Edit"]:::major-edit
        L2["Minor Edit"]:::minor-edit
        L3["Context/No Edit"]:::context
    end
    
    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB
    classDef context fill:#FFFFFF
Loading

Notes

  • Risk level: Medium - While following an established pattern, these changes affect critical instance lifecycle operations and weren't fully tested locally due to test suite issues
  • Pattern source: Implementation follows the exact same retry pattern used in the existing getInstanceTypes method
  • Backward compatibility: Should maintain existing API behavior while adding resilience to transient failures
  • Session info: Requested by Alec Fong (@theFong) in session https://app.devin.ai/sessions/c1ea1276c3ea4ff3a67886a42572fbd7

- Add retry logic to AddSSHKey, LaunchInstance, GetInstance, TerminateInstance, ListInstances, and RestartInstance API calls
- Use existing retry infrastructure: collections.RetryWithDataAndAttemptCount and getBackoff()
- Apply consistent error handling with handleAPIError for all retry-wrapped functions
- Follow same pattern as existing GetInstanceTypes retry implementation
- Add proper lint annotations for deferred response body cleanup

This ensures all Lambda Labs API calls have the same resilience and retry behavior,
improving reliability when dealing with transient network issues or API rate limits.

Co-Authored-By: Alec Fong <[email protected]>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@theFong theFong merged commit 8b0a3c4 into main Aug 9, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants