|
| 1 | +# How to Add a Cloud Provider |
| 2 | + |
| 3 | +This guide explains how to add a new cloud provider to the Brev Cloud SDK (v1). The Lambda Labs provider is the best working, well-tested example—use it as your canonical reference. |
| 4 | + |
| 5 | +Goals: |
| 6 | +- Implement a provider-specific CloudCredential (factory) and CloudClient (implementation) that satisfy pkg/v1 interfaces. |
| 7 | +- Accurately declare Capabilities based on the provider’s API surface. |
| 8 | +- Implement at least instance lifecycle and instance types, adhering to security requirements. |
| 9 | +- Add validation tests and (optionally) a GitHub Actions workflow to run them with real credentials. |
| 10 | + |
| 11 | +Helpful background: |
| 12 | +- Architecture overview: ../docs/ARCHITECTURE.md |
| 13 | +- Security requirements: ../docs/SECURITY.md |
| 14 | +- Validation testing framework: ../docs/VALIDATION_TESTING.md |
| 15 | +- v1 design notes: ../pkg/v1/V1_DESIGN_NOTES.md |
| 16 | + |
| 17 | +Provider examples: |
| 18 | +- Lambda Labs (canonical): ../internal/lambdalabs/v1/README.md |
| 19 | +- Nebius (in progress): ../internal/nebius/v1/README.md |
| 20 | +- Fluidstack (in progress): ../internal/fluidstack/v1/README.md |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +## Core v1 Interfaces You Must Target |
| 25 | + |
| 26 | +CloudClient is a composed interface of provider capabilities. You don’t need to implement everything—only what your provider supports—but you must advertise Capabilities correctly. |
| 27 | + |
| 28 | +- CloudClient composition: ../pkg/v1/client.go |
| 29 | + - Key aggregation: CloudBase, CloudQuota, CloudRebootInstance, CloudStopStartInstance, CloudResizeInstanceVolume, CloudMachineImage, CloudChangeInstanceType, CloudModifyFirewall, CloudInstanceTags, UpdateHandler |
| 30 | +- Capabilities system: ../pkg/v1/capabilities.go |
| 31 | +- Instance lifecycle, validation helpers, and types: ../pkg/v1/instance.go |
| 32 | +- Instance types and validation helpers: ../pkg/v1/instancetype.go |
| 33 | + |
| 34 | +Patterns to follow: |
| 35 | +- Embed v1.NotImplCloudClient in your client so unsupported methods gracefully return ErrNotImplemented (see ../pkg/v1/notimplemented.go). |
| 36 | +- Accurately return capability flags that match your provider’s real API. |
| 37 | +- Prefer stable, provider-native identifiers; otherwise use MakeGenericInstanceTypeID/MakeGenericInstanceTypeIDFromInstance. |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +## Directory Layout |
| 42 | + |
| 43 | +Create a new provider folder: |
| 44 | + |
| 45 | +- internal/{provider}/ |
| 46 | + - SECURITY.md (provider-specific notes; link to top-level security expectations) |
| 47 | + - CONTRIBUTE.md (optional provider integration notes) |
| 48 | + - v1/ |
| 49 | + - client.go (credentials and client) |
| 50 | + - instance.go (instance lifecycle + helpers) |
| 51 | + - instancetype.go (instance types) |
| 52 | + - capabilities.go (capability declarations) |
| 53 | + - networking.go, image.go, storage.go, tags.go, quota.go, location.go (as applicable) |
| 54 | + - validation_test.go (validation suite entry point) |
| 55 | + |
| 56 | +Use Lambda Labs as the pattern: |
| 57 | +- ../internal/lambdalabs/v1/client.go |
| 58 | +- ../internal/lambdalabs/v1/instance.go |
| 59 | +- ../internal/lambdalabs/v1/capabilities.go |
| 60 | + |
| 61 | +--- |
| 62 | + |
| 63 | +## Minimal Scaffold (Copy/Paste Template) |
| 64 | + |
| 65 | +Place in internal/{provider}/v1/client.go. Adjust names, imports, and fields for your provider. |
| 66 | + |
| 67 | +```go |
| 68 | +package v1 |
| 69 | + |
| 70 | +import ( |
| 71 | + "context" |
| 72 | + |
| 73 | + v1 "github.com/brevdev/cloud/pkg/v1" |
| 74 | +) |
| 75 | + |
| 76 | +type {Provider}Credential struct { |
| 77 | + RefID string |
| 78 | + // Add auth fields (e.g., APIKey, ClientID, Secret, Tenant, etc.) |
| 79 | +} |
| 80 | + |
| 81 | +var _ v1.CloudCredential = &{Provider}Credential{} |
| 82 | + |
| 83 | +func New{Provider}Credential(refID string /* auth fields */) *{Provider}Credential { |
| 84 | + return &{Provider}Credential{ |
| 85 | + RefID: refID, |
| 86 | + // ... |
| 87 | + } |
| 88 | +} |
| 89 | + |
| 90 | +func (c *{Provider}Credential) GetReferenceID() string { return c.RefID } |
| 91 | +func (c *{Provider}Credential) GetAPIType() v1.APIType { return v1.APITypeLocational /* or v1.APITypeGlobal */ } |
| 92 | +func (c *{Provider}Credential) GetCloudProviderID() v1.CloudProviderID { |
| 93 | + return "{provider-id}" // e.g., "lambdalabs" |
| 94 | +} |
| 95 | +func (c *{Provider}Credential) GetTenantID() (string, error) { |
| 96 | + // Derive stable tenant ID for quota/account scoping if possible |
| 97 | + return "", nil |
| 98 | +} |
| 99 | + |
| 100 | +func (c *{Provider}Credential) GetCapabilities(_ context.Context) (v1.Capabilities, error) { |
| 101 | + return get{Provider}Capabilities(), nil |
| 102 | +} |
| 103 | + |
| 104 | +func (c *{Provider}Credential) MakeClient(ctx context.Context, location string) (v1.CloudClient, error) { |
| 105 | + // Create a client configured for a given location if locational API |
| 106 | + return New{Provider}Client(c.RefID /* auth fields */).MakeClient(ctx, location) |
| 107 | +} |
| 108 | + |
| 109 | +// ---------------- Client ---------------- |
| 110 | + |
| 111 | +type {Provider}Client struct { |
| 112 | + v1.NotImplCloudClient |
| 113 | + refID string |
| 114 | + location string |
| 115 | + // add http/sdk client fields, base URLs, etc. |
| 116 | +} |
| 117 | + |
| 118 | +var _ v1.CloudClient = &{Provider}Client{} |
| 119 | + |
| 120 | +func New{Provider}Client(refID string /* auth fields */) *{Provider}Client { |
| 121 | + return &{Provider}Client{ |
| 122 | + refID: refID, |
| 123 | + // init http/sdk clients here |
| 124 | + } |
| 125 | +} |
| 126 | + |
| 127 | +func (c *{Provider}Client) GetAPIType() v1.APIType { return v1.APITypeLocational /* or Global */ } |
| 128 | +func (c *{Provider}Client) GetCloudProviderID() v1.CloudProviderID { return "{provider-id}" } |
| 129 | +func (c *{Provider}Client) GetReferenceID() string { return c.refID } |
| 130 | +func (c *{Provider}Client) GetTenantID() (string, error) { return "", nil } |
| 131 | + |
| 132 | +func (c *{Provider}Client) MakeClient(_ context.Context, location string) (v1.CloudClient, error) { |
| 133 | + c.location = location |
| 134 | + return c, nil |
| 135 | +} |
| 136 | +``` |
| 137 | + |
| 138 | +Declare capabilities in internal/{provider}/v1/capabilities.go: |
| 139 | + |
| 140 | +```go |
| 141 | +package v1 |
| 142 | + |
| 143 | +import ( |
| 144 | + "context" |
| 145 | + |
| 146 | + v1 "github.com/brevdev/cloud/pkg/v1" |
| 147 | +) |
| 148 | + |
| 149 | +func get{Provider}Capabilities() v1.Capabilities { |
| 150 | + return v1.Capabilities{ |
| 151 | + v1.CapabilityCreateInstance, |
| 152 | + v1.CapabilityTerminateInstance, |
| 153 | + v1.CapabilityCreateTerminateInstance, |
| 154 | + // add others supported by your provider: reboot, stop/start, machine-image, tags, resize-volume, modify-firewall, etc. |
| 155 | + } |
| 156 | +} |
| 157 | + |
| 158 | +func (c *{Provider}Client) GetCapabilities(_ context.Context) (v1.Capabilities, error) { |
| 159 | + return get{Provider}Capabilities(), nil |
| 160 | +} |
| 161 | + |
| 162 | +func (c *{Provider}Credential) GetCapabilities(_ context.Context) (v1.Capabilities, error) { |
| 163 | + return get{Provider}Capabilities(), nil |
| 164 | +} |
| 165 | +``` |
| 166 | + |
| 167 | +Implement instance lifecycle in internal/{provider}/v1/instance.go (map to provider API): |
| 168 | + |
| 169 | +```go |
| 170 | +package v1 |
| 171 | + |
| 172 | +import ( |
| 173 | + "context" |
| 174 | + "fmt" |
| 175 | + |
| 176 | + v1 "github.com/brevdev/cloud/pkg/v1" |
| 177 | +) |
| 178 | + |
| 179 | +func (c *{Provider}Client) CreateInstance(ctx context.Context, attrs v1.CreateInstanceAttrs) (*v1.Instance, error) { |
| 180 | + // 1) ensure SSH key present (or inject via API) per ../docs/SECURITY.md |
| 181 | + // 2) map attrs to provider request (location, instance type, image, tags, firewall rules if supported) |
| 182 | + // 3) launch and return instance converted to v1.Instance |
| 183 | + return nil, fmt.Errorf("not implemented") |
| 184 | +} |
| 185 | + |
| 186 | +func (c *{Provider}Client) GetInstance(ctx context.Context, id v1.CloudProviderInstanceID) (*v1.Instance, error) { |
| 187 | + return nil, fmt.Errorf("not implemented") |
| 188 | +} |
| 189 | + |
| 190 | +func (c *{Provider}Client) ListInstances(ctx context.Context, args v1.ListInstancesArgs) ([]v1.Instance, error) { |
| 191 | + return nil, fmt.Errorf("not implemented") |
| 192 | +} |
| 193 | + |
| 194 | +func (c *{Provider}Client) TerminateInstance(ctx context.Context, id v1.CloudProviderInstanceID) error { |
| 195 | + return fmt.Errorf("not implemented") |
| 196 | +} |
| 197 | + |
| 198 | +// Optional if supported: |
| 199 | +func (c *{Provider}Client) RebootInstance(ctx context.Context, id v1.CloudProviderInstanceID) error { return fmt.Errorf("not implemented") } |
| 200 | +func (c *{Provider}Client) StopInstance(ctx context.Context, id v1.CloudProviderInstanceID) error { return fmt.Errorf("not implemented") } |
| 201 | +func (c *{Provider}Client) StartInstance(ctx context.Context, id v1.CloudProviderInstanceID) error { return fmt.Errorf("not implemented") } |
| 202 | + |
| 203 | +// Merge strategies (pass-through is acceptable baseline). |
| 204 | +func (c *{Provider}Client) MergeInstanceForUpdate(_ v1.Instance, newInst v1.Instance) v1.Instance { return newInst } |
| 205 | +func (c *{Provider}Client) MergeInstanceTypeForUpdate(_ v1.InstanceType, newIt v1.InstanceType) v1.Type { return newIt } |
| 206 | +``` |
| 207 | + |
| 208 | +See the canonical mapping and conversion logic in Lambda Labs: |
| 209 | +- Create/terminate/list/reboot: ../internal/lambdalabs/v1/instance.go |
| 210 | +- Capabilities: ../internal/lambdalabs/v1/capabilities.go |
| 211 | +- Client/credential + NotImpl: ../internal/lambdalabs/v1/client.go |
| 212 | + |
| 213 | +Implement instance types in internal/{provider}/v1/instancetype.go: |
| 214 | + |
| 215 | +- Implement: |
| 216 | + - GetInstanceTypes(ctx, args GetInstanceTypeArgs) ([]InstanceType, error) |
| 217 | + - GetInstanceTypePollTime() time.Duration |
| 218 | +- Use stable IDs if provider offers them. If not, use MakeGenericInstanceTypeID. |
| 219 | +- Validate with helpers: |
| 220 | + - ValidateGetInstanceTypes: ../pkg/v1/instancetype.go |
| 221 | + - ValidateLocationalInstanceTypes: ../pkg/v1/instancetype.go |
| 222 | + - ValidateStableInstanceTypeIDs (if you maintain a stable ID list) |
| 223 | + |
| 224 | +--- |
| 225 | + |
| 226 | +## Capabilities: Be Precise |
| 227 | + |
| 228 | +Capability flags live in ../pkg/v1/capabilities.go. Only include capabilities your API actually supports. For example, Lambda Labs supports: |
| 229 | +- Create/terminate/reboot instance |
| 230 | +- Does not (currently) support stop/start, resize volume, machine image, tags |
| 231 | + |
| 232 | +Reference: |
| 233 | +- Lambda capabilities: ../internal/lambdalabs/v1/capabilities.go |
| 234 | + |
| 235 | +--- |
| 236 | + |
| 237 | +## Security Requirements |
| 238 | + |
| 239 | +All providers must conform to ../docs/SECURITY.md: |
| 240 | +- Default deny all inbound, allow all outbound |
| 241 | +- SSH server must be available with key-based auth |
| 242 | +- Firewall rules should be explicitly configured via FirewallRule when supported |
| 243 | +- If your provider’s firewall model is global/project-scoped rather than per-instance, document limitations in internal/{provider}/SECURITY.md and reflect that by omitting CapabilityModifyFirewall if applicable. |
| 244 | + |
| 245 | +Provider-specific security doc examples: |
| 246 | +- Lambda Labs: ../internal/lambdalabs/SECURITY.md |
| 247 | +- Nebius: ../internal/nebius/SECURITY.md |
| 248 | +- Fluidstack: ../internal/fluidstack/v1/SECURITY.md |
| 249 | + |
| 250 | +--- |
| 251 | + |
| 252 | +## Validation Testing and CI |
| 253 | + |
| 254 | +Use the shared validation suite to test your provider with real credentials. |
| 255 | + |
| 256 | +- Validation framework and instructions: ../docs/VALIDATION_TESTING.md |
| 257 | +- Shared package: ../internal/validation/suite.go |
| 258 | + |
| 259 | +Steps: |
| 260 | +1) Create internal/{provider}/v1/validation_test.go: |
| 261 | + |
| 262 | +```go |
| 263 | +package v1 |
| 264 | + |
| 265 | +import ( |
| 266 | + "os" |
| 267 | + "testing" |
| 268 | + |
| 269 | + "github.com/brevdev/cloud/internal/validation" |
| 270 | +) |
| 271 | + |
| 272 | +func TestValidationFunctions(t *testing.T) { |
| 273 | + if testing.Short() { |
| 274 | + t.Skip("Skipping validation tests in short mode") |
| 275 | + } |
| 276 | + |
| 277 | + apiKey := os.Getenv("YOUR_PROVIDER_API_KEY") |
| 278 | + if apiKey == "" { |
| 279 | + t.Skip("YOUR_PROVIDER_API_KEY not set, skipping validation tests") |
| 280 | + } |
| 281 | + |
| 282 | + cfg := validation.ProviderConfig{ |
| 283 | + Credential: New{Provider}Credential("validation-test" /* auth fields from env, e.g., apiKey */), |
| 284 | + } |
| 285 | + validation.RunValidationSuite(t, cfg) |
| 286 | +} |
| 287 | +``` |
| 288 | + |
| 289 | +2) Local runs: |
| 290 | +- make test # skips validation (short) |
| 291 | +- make test-validation # runs validation (long) |
| 292 | +- make test-all # runs everything |
| 293 | + |
| 294 | +3) CI workflow (recommended): |
| 295 | +- Add .github/workflows/validation-{provider}.yml (copy Lambda Labs workflow if available or follow VALIDATION_TESTING.md). |
| 296 | +- Store secrets in GitHub Actions (e.g., YOUR_PROVIDER_API_KEY). |
| 297 | + |
| 298 | +--- |
| 299 | + |
| 300 | +## Checklist |
| 301 | + |
| 302 | +- [ ] Add internal/{provider}/v1 with client.go, instance.go, capabilities.go, instancetype.go |
| 303 | +- [ ] Embed v1.NotImplCloudClient in client and only implement supported methods |
| 304 | +- [ ] Accurately set Capabilities |
| 305 | +- [ ] Implement instance types with stable IDs where possible |
| 306 | +- [ ] Conform to security model; document provider-specific nuances |
| 307 | +- [ ] Add validation_test.go and (optionally) CI workflow |
| 308 | +- [ ] Run make lint and make test locally |
| 309 | +- [ ] Add provider docs (README.md under provider folder) describing API mapping and feature coverage |
| 310 | + |
| 311 | +--- |
| 312 | + |
| 313 | +## References |
| 314 | + |
| 315 | +- Architecture: ../docs/ARCHITECTURE.md |
| 316 | +- Security: ../docs/SECURITY.md |
| 317 | +- Validation testing: ../docs/VALIDATION_TESTING.md |
| 318 | +- CloudClient and composition: ../pkg/v1/client.go |
| 319 | +- Capabilities: ../pkg/v1/capabilities.go |
| 320 | +- Instance lifecycle and validations: ../pkg/v1/instance.go |
| 321 | +- Instance types and validations: ../pkg/v1/instancetype.go |
| 322 | +- Lambda Labs example: |
| 323 | + - Client/Credential: ../internal/lambdalabs/v1/client.go |
| 324 | + - Capabilities: ../internal/lambdalabs/v1/capabilities.go |
| 325 | + - Instance operations: ../internal/lambdalabs/v1/instance.go |
| 326 | + - Provider README: ../internal/lambdalabs/v1/README.md |
0 commit comments