feat(tencent): add Tencent Cloud provider#165
Conversation
|
Thanks for the PR, will review soon :) |
rafeegnash
left a comment
There was a problem hiding this comment.
Thanks for the huge body of work here — 10K lines across 15 phases is real engineering, and the Tencent provider integration is in great shape overall. The lazy per-service SDK clients, the tencent-api + filter maker verbs, the knownHallucinatedActions map, the credential resolution chain — all thoughtful and a clean fit with the codebase's conventions.
We'd love to take you up on your offer to split into smaller PRs — at this scale, three focused PRs would make this much easier to review carefully and give you faster turnaround on the parts that are already solid:
- Provider core —
internal/tencent/*(the SDK client, per-service modules, security scans, Cobra commands, credential resolution). This is the strongest piece and could land first with minimal back-and-forth. - Maker integration —
internal/maker/exec_tencent*.go,tencent_prompts.go, the[*]placeholder change inexec.go. Most of our review notes are here. - HTTP API —
internal/api/*plus thecmd/server.gowiring. This is the layer that needs the most careful auth hardening (see notes below).
Totally fine if splitting isn't realistic — we can keep reviewing this PR as-is, just wanted to flag that the offer is welcome.
A few things to flag before re-review:
Branch is out of date
The merge base is 3e65a24 (feat: mcp #164, from late April). Since then master has merged #169–#174 (k8s helm, exec/apply/port-forward, node lifecycle, create aks, native MCP K8s tools, conversation history cap, SRE playbook framework). The diff currently shows those files as deletions — that's purely a rebase-conflict artifact, not anything you intended. When you get a chance, could you pull master and rebase? The conflict surface should mostly be cmd/root.go and cmd/ask.go where the new K8s subcommands register alongside what you've added.
Security notes on the HTTP API
These are the items we'd want fixed before the HTTP API piece lands (whether as part of this PR or in the split):
internal/api/middleware.go:43— token comparison should usecrypto/subtle.ConstantTimeCompareto avoid timing-attack token recovery. Two-line fix, see inline comment.internal/api/server.go:67-69— empty--tokencurrently disables auth on every route includingPOST /api/v1/maker/apply. It only logs a warning, and the first help-text example shows running with no token, which feels like a footgun. Could we either refuse to start without a token (unless--insecureis explicit), or at least gatePOST /maker/applywhen the token is unset?internal/maker/exec_tencent_filter.go:182—regexp.Compile(value)on user-supplied patterns has a ReDoS risk ((a+)+$style patterns can spin a goroutine). A pattern-length cap + per-regex deadline (orregexp2with timeout) would close this.
High-severity items worth addressing
internal/api/routes.go:273— the?region=query param goes verbatim into the SDK client. Pinned to*.tencentcloudapi.comso not full SSRF, but enables enumeration and potentially unintended API charges. A simple regex validation (^ap-[a-z]+-[0-9]*$) or matching againstListAllRegionswould tighten this.internal/api/server.go:48—CORSOrigindefaults to*. Bearer auth via header mitigates CSRF, but ahttp://localhostdefault with explicit override feels safer.internal/tencent/raw.go— the destructive denylist is prefix-based (Terminate|Delete|Destroy|Reset|Release|Discontinue), so CAM actions likeAddUser,CreateAccessKey,AttachUserPolicyslip through without needing--destroyer. If LLM prompts could ever be influenced by fetched data, this widens the blast radius. Would you be open to an allowlist of read-only actions for non-destroyer mode?
Should-fix before merge
internal/tencent/context.go—ctxis accepted on the gather functions but the Tencent SDK has noWithContextvariants, so the context is effectively dead. Ctrl-C duringclanker ask --tencentwon't interrupt anything in-flight. Either a comment explaining the limitation or wiringHttpProfile.ReqTimeoutper-call would help.internal/tencent/context.go—GetRelevantContexthardcodesLimit = 100and silently truncates. Accounts with more than 100 of any resource type send incomplete data to the LLM. The CLIlistcommands paginate correctly — could the context-gather layer do the same, or at least log(showing first 100)like the AWS path does?internal/maker/exec_tencent_filter.go—validateFilterCommandandfilterMatchare non-trivial decision-tree code with no tests. A table-driven test would help guard against regressions.
Nits / parity notes
internal/tencent/client.golacks aNewClientWithBackendCredentialsfactory like other providers have — fine for now since Tencent isn't wired into the backend credential flow yet, but worth adding for consistency.internal/api/routes.go:104-128callsGetRelevantContext, discards the result, then re-callsgatherTencentByType— doubled API calls per request.Credentialsstruct (client.go:18) has noString()/MarshalJSONredaction. AWS has the same gap, but one%+vaway from leakingSecretKey.internal/tencent/raw.go:80—paramsJSONis unmarshalled with no per-field size cap. Effective cap is the 1 MiB body limit; 64 KiB would be plenty.- No MCP tools for Tencent — looks intentional for this PR's scope; mentioning so reviewers don't wonder.
What we really like
- Lazy per-service SDK clients match the Tencent SDK's design — clearly justified divergence from AWS's eager init.
- Generated files have the correct
Code generated by ...; DO NOT EDIT.headers and the generator uses//go:build ignore. - The reflection-based tag extractor in
tags.gois bounded — no recursion, no panic risk on unusual SDK shapes. - The
--destroyergate follows the existing maker convention. - Security audits use only describe APIs — no port scanning of discovered endpoints.
knownHallucinatedActionswith "did you mean" hints is a nice UX touch for LLM-typo'd actions.
Happy to re-review piece by piece as the splits land (or this PR after a rebase), whichever is easier for you. Thanks again for taking this on.
| } | ||
| auth := r.Header.Get("Authorization") | ||
| const prefix = "Bearer " | ||
| if !strings.HasPrefix(auth, prefix) || strings.TrimSpace(auth[len(prefix):]) != s.cfg.Token { |
There was a problem hiding this comment.
Critical — non-constant-time token comparison.
This is a timing side-channel an attacker can use to recover the token byte-by-byte by measuring response times across many requests. Practical over LAN, sometimes even over the open internet.
Suggested fix:
import "crypto/subtle"
// ...
provided := []byte(strings.TrimSpace(auth[len(prefix):]))
expected := []byte(s.cfg.Token)
if subtle.ConstantTimeCompare(provided, expected) != 1 {
writeError(w, http.StatusUnauthorized, "unauthorized", "missing or invalid bearer token")
return
}(ConstantTimeCompare returns 0 if the lengths differ as well, so this also covers the length-equal-but-content-mismatch case.)
| // Run starts the HTTP server and blocks until ctx is cancelled or | ||
| // ListenAndServe returns an error. | ||
| func (s *Server) Run(ctx context.Context) error { | ||
| if strings.TrimSpace(s.cfg.Token) == "" { |
There was a problem hiding this comment.
Critical — empty token disables auth on every route, including POST /maker/apply.
With cfg.Token == "" the middleware short-circuits and lets every request through. That includes plan execution against live Tencent credentials. The first example in the help text shows clanker server --port 8080 with no token, so this is what operators will run by default in dev — and a malicious page loaded on the same machine could then call POST /maker/apply cross-origin (CORS defaults to *).
Could we either:
- Refuse to start without a token unless an explicit
--insecureflag is passed, or - At minimum, block
POST /maker/apply(and any other mutating route) when the token is unset, even if read-only routes stay open?
The one-line WARNING log is easy to miss in a busy terminal.
| if !ok { | ||
| return false | ||
| } | ||
| re, err := regexp.Compile(value) |
There was a problem hiding this comment.
Critical — ReDoS via user-supplied regex.
value comes from the plan JSON, which in the HTTP path comes from the POST /api/v1/maker/apply body. A pattern like (a+)+$ against a long string will spin a goroutine that context.Done() cannot preempt — Go's regexp package doesn't support deadline-based cancellation.
Options:
- Pattern-length cap + complexity heuristic — reject patterns over ~256 chars or containing nested quantifiers (
(a+)+,(a*)*,(a|a)*). regexp2package — drop-in replacement withMatchTimeoutsupport.- Pre-validate via
regexp/syntax.Parseand walk the AST rejecting unbounded backtracking.
Option 1 is the smallest change. Option 2 is most robust if you're OK adding a dep.
At minimum, bound the string length being matched against — if s is a huge SDK response field, the worst-case input grows.
| // tencentClient builds a Tencent client for this request. The region can be | ||
| // overridden per request via ?region=ap-jakarta; otherwise the daemon's | ||
| // default (resolved from config / env at startup) is used. | ||
| func (s *Server) tencentClient(r *http.Request) (*tencent.Client, error) { |
There was a problem hiding this comment.
High — ?region= query param accepted without validation.
The value is passed verbatim into the Tencent SDK as the region, which becomes part of the request signing and the endpoint hostname. The hostname is pinned to *.tencentcloudapi.com by the SDK so it's not arbitrary SSRF, but a caller can:
- Enumerate which regions the credential has access to via differential responses.
- Cause unintended API charges by looping through bogus region strings.
Validate against ^[a-z]+-[a-z]+(-[0-9]+)?$ or check membership in ListAllRegions(). Cheap to add.
| cfg.Addr = ":8080" | ||
| } | ||
| if cfg.CORSOrigin == "" { | ||
| cfg.CORSOrigin = "*" |
There was a problem hiding this comment.
Medium — CORS default is *.
Bearer auth via Authorization header mitigates classic CSRF (browsers won't auto-send), but a wildcard default still feels too permissive — any origin can read responses on a successful auth, and combined with the empty-token case above, an unset token on a clanker server instance is reachable from any web page.
Suggest defaulting to http://localhost and requiring an explicit --cors-origin to relax it.
| @@ -0,0 +1,1467 @@ | |||
| package tencent | |||
There was a problem hiding this comment.
Should-fix — ctx parameter is accepted but never propagated to the Tencent SDK.
The SDK's generated clients don't expose WithContext variants, so passing ctx here is purely cosmetic — a Ctrl-C during clanker ask --tencent won't interrupt anything in flight. The only function that respects the context is contextCOS via its own context.WithTimeout (which would still wait on the SDK call to return).
Two options:
- Document the limitation at the top of this file so the next person doesn't expect cancellation to work.
- Wire
HttpProfile.ReqTimeouton each per-serviceprofile.NewClientProfile()so calls at least have a hard ceiling.
Option 2 is the safer choice — without it a hung Tencent endpoint blocks the gather goroutine indefinitely.
| return out.String(), nil | ||
| } | ||
|
|
||
| // contextCVMs returns a compact JSON array of CVMs in this client's region. |
There was a problem hiding this comment.
Should-fix — silent truncation at Limit = 100.
The gather functions cap at 100 items per resource type and don't paginate. An account with 150 CVMs sees only the first 100 in the LLM's context, with no signal that more exist. The CLI list commands (e.g. internal/tencent/cvm.go) paginate correctly with an offset loop, so the pattern exists in the codebase.
At minimum, surface the truncation: AWS does this with (showing first %d roles) in the error/info path. Ideally, paginate fully (or to a higher cap like 500) so the LLM gets the full picture for at least small-to-medium accounts.
| const defaultRegion = "ap-singapore" | ||
|
|
||
| // Credentials holds the resolved Tencent Cloud credentials and target region. | ||
| type Credentials struct { |
There was a problem hiding this comment.
Nit — credentials struct has no redaction.
No String() / MarshalJSON method, so any %+v or %v formatting of Credentials (or any struct that embeds it, like Client) leaks SecretKey into logs. No current call site does this, but it's one careless debug line away. AWS has the same gap, so this is parity — but worth tightening on both sides:
func (c Credentials) String() string {
return fmt.Sprintf("Credentials{SecretID: %q, Region: %q, SecretKey: <redacted>}", c.SecretID, c.Region)
}| "data": []interface{}{}, | ||
| "warning": err.Error(), | ||
| }) | ||
| _ = raw |
There was a problem hiding this comment.
Nit — GetRelevantContext result is discarded then gatherTencentByType is called separately, doubling the API calls.
The raw variable above is assigned and then the code path just continues to gatherTencentByType(...), which makes the same SDK calls a second time. Either consume raw here or drop the GetRelevantContext call entirely from this handler.
| // validateTencentCommand rejects anything that isn't a well-formed tencent-api | ||
| // call. Destructive actions (Terminate*, Delete*, Reset*) are gated behind | ||
| // --destroyer to match the policy applied to every other provider. | ||
| func validateTencentCommand(args []string, allowDestructive bool) error { |
There was a problem hiding this comment.
Should-fix — no test coverage for validateTencentCommand or validateFilterCommand.
This is a security-relevant validator (rejects newlines, gates destructive actions, enforces arg-count bounds). A table-driven test covering each error path — empty command, wrong verb, missing service/action/region, newline in args, destructive without --destroyer, oversized arg count — would help prevent regressions. The same applies to filterMatch in exec_tencent_filter.go.
|
cool thank you for all comment and fixing recommendation I will check it directly |
Adds a Tencent Cloud provider to the Clanker CLI alongside the existing
AWS/GCP/Azure/Cloudflare/Fly/Verda/Vercel/Railway providers. Built up
in 15 phases over the past month against the bgdnvk/clanker upstream;
this PR consolidates the full provider as one contribution.
Coverage
* Compute cvm + lighthouse (lightweight cloud server)
* Network vpc, subnet, security-group + rule audit, eip, clb,
nat, vpn, ccn, direct-connect
* Storage cbs (Cloud Block Storage), cos (Object Storage)
* Database mysql (cdb), postgres, redis, mongodb, cynosdb (tdsql-c)
* Container tke clusters + kubeconfig fetch
* Edge cdn, edgeone, waf, anti-ddos
* Identity cam users
* Observability cloud monitor metrics (CVM + Lighthouse), cls log
topics, cloud audit tracks, alarm policies
* Cost monthly billing by product + top-N resources
* Tags flat map[string]string surfaced on all summary
structs that the SDK returns tags for (CVM, Lighthouse,
VPC, Postgres); reflection-based helper handles the
SDK's inconsistent tag-field naming across services.
Security audits
* public-cvm-exposure CVMs with sensitive ports open to 0.0.0.0/0
* clb-exposure public CLBs with risky listeners
* db-exposure managed DBs reachable from the public internet
* idle-eips EIPs unbound but billed
* unencrypted-cbs CBS volumes without encryption
* cert-expiry SSL certs expiring within N days
* cam-hygiene CAM users missing phone/email
* waf-coverage CDN/EdgeOne hosts not covered by WAF
* antiddos-coverage account anti-DDoS posture + per-region targets
* audit-coverage Cloud Audit tracks status
HTTP API
Hooks into the existing `clanker server` route table with bearer-auth
endpoints for inventory (/api/v1/tencent/resources/{type}), scans
(/api/v1/tencent/scan/{kind}), monitoring (/api/v1/tencent/metrics/
{cvm|lighthouse}), cost (/api/v1/tencent/cost/by-product, /resources),
and topology (/api/v1/tencent/topology). Maker plan and apply share
the existing /api/v1/maker/{plan,apply} endpoints.
Maker integration
* tencent-api verb 5-arg form [tencent-api, service, action,
region, json-params] dispatches to a generic
SendRaw over Tencent's CommonRequest signed
transport. No tccli dependency.
* tencent_prompts.go planner system prompt with chain shapes A-H,
anti-patterns, static-spec vs runtime
metrics rules, and the bgdnvk-style filter
verb example.
* filter verb new [filter, sourceIdx, arrayPath, field,
op, value] post-processor returns the
matching subset of a prior command's output.
Operators: > < >= <= == != contains
startsWith matches. Lets Maker answer
"find X by criteria" queries directly
instead of dumping full inventory.
* [*] array placeholders jsonPathString in internal/maker/exec.go
now handles $.X[*].Y wildcard paths,
binding to a JSON array literal so
"InstanceIds":<CVM_IDS> chains work.
* Action denylist knownHallucinatedActions catches common
LLM-invented Tencent action names with
"did you mean..." hints before the round-
trip (GetProductMetricData -> GetMonitorData,
cvm.ListInstances -> DescribeInstances, etc).
Credentials
Reads in this order: viper tencent.{secret_id,secret_key,region},
then TENCENTCLOUD_SECRET_* (official Tencent SDK env names),
then TENCENT_SECRET_* short aliases. Default region ap-singapore.
Dependencies
* tencentcloud/tencentcloud-sdk-go/{cvm,vpc,cbs,clb,cdb,postgres,
redis,mongodb,tke,tag,cam,monitor,cls,billing,lighthouse,...}
The provider is fully read-by-default; write operations
(Create/Modify/Run) go through Maker's existing plan+apply gate, with
destructive Terminate/Delete/Reset/Release/Discontinue actions
additionally requiring the existing --destroyer flag.
Tencent's two monitor APIs disagree about the canonical dimension casing for namespace QCE/LIGHTHOUSE: - DescribeBaseMetrics (metadata) reports "instanceid" (lowercase) - GetMonitorData (data) accepts "InstanceId" (PascalCase) We trusted DescribeBaseMetrics when building lighthouse.go, which is why every Lighthouse metric call has been returning Tencent's misleading "[InvalidParameterValue] : unauthorized operation or the instance has been destroyed" — same error code Tencent reuses for genuine permission gaps and lifecycle issues, which sent the debug down two days of false leads (CAM, agent install, account type, sub-user vs root, ...). The user's CAM is genuinely AdministratorAccess. The Cloud Monitor agent is installed. The Tencent Console displays the metrics fine. The data was always there — Tencent's data API just rejected our spelling of the dimension name with a wildly inappropriate error code. PascalCase is the same form CVM uses, so the fix is a one-character change to the lighthouseDimensionKey constant. Added a comment block explaining the discrepancy so the next person who reads this code doesn't repeat the investigation. Verified live against lhins-fprj6w5h (ap-singapore): CpuUsage 0.80% (avg 0.96%, 59 samples) MemUsage 38.79% (avg 38.72%, 59 samples) DiskUsage 22.18% (60 samples) LighthouseOutpkg 1 (avg 1.22, 59 samples)
DescribeBillSummaryByProduct (the per-product cost call) returns RealCost but no tax field, so Clanker cost totals never matched the Tencent console tax-inclusive headline. RealCost is total consumption (voucher + cash + tax); the console headline is cash out of pocket. Adds billFeeSummary() — calls DescribeCostExplorerSummary with Dimensions=feeType, FeeType=cost, the only billing API that breaks out tax. BillByProductJSON now embeds a summary object: consumption total RealCost (voucher + cash + tax) voucher amount covered by vouchers cash_before_tax cash portion, pre-tax tax tax amount cash_incl_tax cash_before_tax + tax (matches console headline) The Detail item names are localized display strings; the cash line is "Total Amount After Discount (Excluding Tax)" which contains the word "tax", so the substring match checks "discount" before "tax" to avoid misclassifying the cash line as tax. Verified against a real April 2026 bill: consumption 11,146.37 voucher 5,701.68 cash_before_tax 4,905.13 tax 539.56 cash_incl_tax 5,444.70 = console "Total Cost (Incl Tax)"
- DescribeVoucherInfo / DescribeVoucherUsageDetails: voucher inventory,
balances, per-voucher usage history, and a per-owner-UIN breakdown of
voucher spend (nominal - balance). Voucher APIs only answer on the
account's home region, so they use a region-aware billing client.
- VoucherByOwnerJSON: month-scoped voucher deduction grouped by the
owner account UIN of each billed resource, via DescribeBillResourceSummary
(the voucher APIs carry no per-record UIN).
- CLI: `clanker tencent cost vouchers` / `cost voucher-usage`.
- HTTP API: /cost/vouchers, /cost/voucher-usage/{id}, /cost/voucher-by-owner.
Extend the slim JSON shape returned by /api/v1/tencent/resources/* for every subscription-capable resource: CVM, Lighthouse, CBS, MySQL, Postgres, Redis, MongoDB, CynosDB, CLB, AntiDDoS. PREPAID entries now carry the renewal deadline so callers can see what is about to expire without a separate billing call. Tencent uses two billing-mode conventions (string vs int) and the int form has NO consistent mapping across services — CDB inverts vs the others. internal/tencent/charge_mode.go centralizes the normalization into 'PREPAID' / 'POSTPAID' strings so the JSON shape is uniform.
The maker plan executor (SendRaw) gated calls behind a hand-maintained service map in raw.go, which silently lagged behind the SDK — calling lighthouse.DescribeInstances failed with 'unsupported tencent service' even though every other code path (typed clients, dashboard) handled it fine. Eight other services (antiddos, billing, cdn, cloudaudit, cynosdb, dc, ssl, teo, waf) were also missing from the map. Replace the static map with a go:generate-driven one. gen_services.go walks GOMODCACHE/.../tencentcloud-sdk-go/tencentcloud/*/v* and emits service_versions_gen.go with one entry per service (latest version when multiple are vendored). The cos sentinel is preserved via a manualOverrides map in the generator. Upgrading the SDK now just needs 'go generate ./internal/tencent/...'. The error message in SendRaw now enumerates services from the generated map so it stays accurate.
Add auto_renew (*bool, omitempty) to the slim JSON returned by /api/v1/tencent/resources/* for every prepaid-capable resource where the SDK exposes the renew flag: CVM, Lighthouse, CBS, MySQL, Postgres, Redis, CynosDB, and CLB. Consumers can now check whether an expiring resource will auto-renew or needs manual action — without the field, expiring auto-renewers would generate false-positive alerts. Skipped MongoDB and AntiDDoS — their list endpoints don't expose the renew flag (would require a separate DescribeAutoRenew-style call). Tencent's renew encoding has three flavors handled by new normalizers in charge_mode.go: string 'NOTIFY_AND_AUTO_RENEW' (CVM/CBS/Lighthouse) and 'AUTO_RENEW' (CLB nested in PrepaidAttributes), int64 1 (CDB/Redis/ CynosDB), uint64 1 (Postgres). Using *bool + omitempty so consumers can distinguish 'not on auto-renew' from 'no info available'.
Three critical findings from rafeegnash's review. 1. Constant-time token comparison (api/middleware.go) The previous '!=' compare leaked information through response timing, letting an attacker recover the bearer token byte-by-byte by measuring latency across many requests. Switch to crypto/subtle.ConstantTimeCompare. 2. Refuse to start without a token (api/server.go, cmd/server.go) An empty --token previously disabled auth on every route, including POST /api/v1/maker/apply which can mutate real cloud resources. The server now aborts startup unless --insecure is explicit (or CLANKER_API_TOKEN is set). Help text now shows the token-gated invocation as the default example. 3. Bound the filter 'matches' operator (maker/exec_tencent_filter.go) regexp.Compile took an unbounded user-controlled pattern via the maker plan. Go's RE2 is linear so a true ReDoS is not realistic, but we now cap pattern length at 256 chars and run MatchString under a 100ms wall-clock deadline as defense-in-depth — the filter value originates from LLM output and reaches us through the HTTP API, so tighter bounds keep that surface predictable.
0e6182e to
1bbb914
Compare
Three critical findings from rafeegnash's review. 1. Constant-time token comparison (api/middleware.go) The previous '!=' compare leaked information through response timing, letting an attacker recover the bearer token byte-by-byte by measuring latency across many requests. Switch to crypto/subtle.ConstantTimeCompare. 2. Refuse to start without a token (api/server.go, cmd/server.go) An empty --token previously disabled auth on every route, including POST /api/v1/maker/apply which can mutate real cloud resources. The server now aborts startup unless --insecure is explicit (or CLANKER_API_TOKEN is set). Help text now shows the token-gated invocation as the default example. 3. Bound the filter 'matches' operator (maker/exec_tencent_filter.go) regexp.Compile took an unbounded user-controlled pattern via the maker plan. Go's RE2 is linear so a true ReDoS is not realistic, but we now cap pattern length at 256 chars and run MatchString under a 100ms wall-clock deadline as defense-in-depth — the filter value originates from LLM output and reaches us through the HTTP API, so tighter bounds keep that surface predictable.
|
Thanks again @rafeegnash for the thorough review. First pass landed — Rebase Branch is now rebased onto current Three critical security findings — all fixed
Next Working through the High / Should-fix items now (region validation, CORS default, destructive-action allowlist, ctx propagation, pagination, filter tests). Will push those as a separate commit so re-review can split. On the PR split — let me get the security pass done first, then I'll come back to you on whether to split. Some of the High-severity items overlap maker + API so the boundary isn't as clean as it looked initially. |
Three high-severity items from rafeegnash's review. 1. Validate ?region= before it reaches the Tencent SDK (api/routes.go) Previously any string was passed verbatim into the client's region — enabling enumeration of arbitrary Tencent endpoints and potentially driving unintended API charges. Added a regex covering all current region prefixes (ap|na|eu|sa|cn)-name(-suffix)?, with a typed *errInvalidRegion so the handler chokepoint surfaces 400 instead of the catch-all 401. 23 handler call sites switched to a small writeTencentClientErr helper. 2. CORS default no longer wildcard (api/server.go, cmd/server.go) Default Access-Control-Allow-Origin was "*", letting any page read API responses. Bearer auth via header mitigates CSRF but a hostile origin could still siphon data when a user pastes their token there. Default is now http://localhost:4173 (the bundled dashboard); pass --cors-origin explicitly for non-localhost deployments. 3. Switch destructive check to read-only allowlist (maker/exec_tencent.go) isTencentDestructive previously prefix-matched only Terminate|Delete| Destroy|Reset|Release|Discontinue. CAM mutations like AddUser, CreateAccessKey, AttachUserPolicy slipped through without --destroyer. Flipped to an allowlist of read-only verb prefixes (Describe, Get, List, Query, Lookup, Search, Check, Inquiry). Anything else now requires --destroyer — fail-safe by default. Behavior change to be aware of: verbs that don't match a read prefix (Create*, Run*, Add*, Modify*, Set*, Enable*, Bind*, Associate*, Allocate*, etc.) now require --destroyer. ResetInstancesPassword was previously whitelisted as 'only changes the password' — that whitelist is gone because a password reset locks out anyone using the previous credential, which is a security-affecting mutation.
Three high-severity items from rafeegnash's review. 1. Validate ?region= before it reaches the Tencent SDK (api/routes.go) Previously any string was passed verbatim into the client's region — enabling enumeration of arbitrary Tencent endpoints and potentially driving unintended API charges. Added a regex covering all current region prefixes (ap|na|eu|sa|cn)-name(-suffix)?, with a typed *errInvalidRegion so the handler chokepoint surfaces 400 instead of the catch-all 401. 23 handler call sites switched to a small writeTencentClientErr helper. 2. CORS default no longer wildcard (api/server.go, cmd/server.go) Default Access-Control-Allow-Origin was "*", letting any page read API responses. Bearer auth via header mitigates CSRF but a hostile origin could still siphon data when a user pastes their token there. Default is now http://localhost:4173 (the bundled dashboard); pass --cors-origin explicitly for non-localhost deployments. 3. Switch destructive check to read-only allowlist (maker/exec_tencent.go) isTencentDestructive previously prefix-matched only Terminate|Delete| Destroy|Reset|Release|Discontinue. CAM mutations like AddUser, CreateAccessKey, AttachUserPolicy slipped through without --destroyer. Flipped to an allowlist of read-only verb prefixes (Describe, Get, List, Query, Lookup, Search, Check, Inquiry). Anything else now requires --destroyer — fail-safe by default. Behavior change to be aware of: verbs that don't match a read prefix (Create*, Run*, Add*, Modify*, Set*, Enable*, Bind*, Associate*, Allocate*, etc.) now require --destroyer. ResetInstancesPassword was previously whitelisted as 'only changes the password' — that whitelist is gone because a password reset locks out anyone using the previous credential, which is a security-affecting mutation.
|
High-severity tier landed in
CORS default (
Destructive denylist → read-only allowlist (
Next batch (Should-fix tier) will be: |
Three items from rafeegnash's review. 1. ctx propagation + per-request timeout (tencent/profile.go, all clients) The Tencent SDK has no WithContext variants — caller ctx cancellation cannot interrupt a request in flight. As a defense, every typed client now flows through newClientProfile(endpoint), which sets HttpProfile.ReqTimeout = 30s. Combined with a ctxDone() check between pagination pages and between GetRelevantContext sections, Ctrl-C now bounds the wall-clock cost of cancellation to the single in-flight SDK call (was: indefinite). All 22 profile.NewClientProfile() + Endpoint sites were converted to the helper; the now-unused profile import was stripped from those files. 2. Paginate GetRelevantContext past 100 (tencent/context.go) contextCVMs, contextVPCs, contextSecurityGroups now loop through pages with offset/limit until TotalCount is exhausted, capped at gatherMaxItems (1000) with a logGatherTruncated() warning when the cap fires. These are the highest-cardinality types — production accounts commonly cross 100 here. Other gather functions still single-call at limit=100; bringing them up is mechanical follow-up. 3. Tests for filter validator and matcher (maker/exec_tencent_filter_test.go) Table-driven coverage for validateFilterCommand (arg count, sourceIdx, op enum) and filterMatch (every operator + every JSON value type). Includes the ReDoS-defense cases added in the critical-tier commit: oversize pattern returns false, malformed regex returns false, PCRE- style catastrophic-backtrack patterns don't hang.
Three items from rafeegnash's review. 1. ctx propagation + per-request timeout (tencent/profile.go, all clients) The Tencent SDK has no WithContext variants — caller ctx cancellation cannot interrupt a request in flight. As a defense, every typed client now flows through newClientProfile(endpoint), which sets HttpProfile.ReqTimeout = 30s. Combined with a ctxDone() check between pagination pages and between GetRelevantContext sections, Ctrl-C now bounds the wall-clock cost of cancellation to the single in-flight SDK call (was: indefinite). All 22 profile.NewClientProfile() + Endpoint sites were converted to the helper; the now-unused profile import was stripped from those files. 2. Paginate GetRelevantContext past 100 (tencent/context.go) contextCVMs, contextVPCs, contextSecurityGroups now loop through pages with offset/limit until TotalCount is exhausted, capped at gatherMaxItems (1000) with a logGatherTruncated() warning when the cap fires. These are the highest-cardinality types — production accounts commonly cross 100 here. Other gather functions still single-call at limit=100; bringing them up is mechanical follow-up. 3. Tests for filter validator and matcher (maker/exec_tencent_filter_test.go) Table-driven coverage for validateFilterCommand (arg count, sourceIdx, op enum) and filterMatch (every operator + every JSON value type). Includes the ReDoS-defense cases added in the critical-tier commit: oversize pattern returns false, malformed regex returns false, PCRE- style catastrophic-backtrack patterns don't hang.
|
Should-fix tier landed in ctx + per-request timeout ( Pagination ( Filter tests ( Remaining from the review:
I can roll the nits into a follow-up if you want — they're all tiny and could land together. |
Four small items from the review. 1. Credentials redaction (tencent/client.go) Added String() and MarshalJSON() on Credentials so %v / %+v / Println and json.Marshal all render SecretKey as **** instead of leaking the raw key. Direct field access (the SDK signature path) is unchanged. 2. Drop doubled gather call (api/routes.go) handleTencentResources called GetRelevantContext (full multi-section gather) and discarded the result, then called gatherTencentByType for the requested type — doubling SDK calls per request. Removed the GetRelevantContext call. 3. NewClientWithCredentials factory for parity (tencent/client.go) Added BackendTencentCredentials struct and NewClientWithCredentials constructor matching the shape AWS / GCP / Fly.io / etc. already use. Not wired into the backend credential flow yet — kept for consistency so the dispatch layer can treat Tencent the same as every other provider. 4. Per-field size cap on paramsJSON (tencent/raw.go) maxParamsJSONBytes (256 KiB total) + maxParamsFieldBytes (64 KiB per string field, walked recursively into nested maps and slices). The effective cap was the 1 MiB HTTP body limit, which is far larger than any legitimate Tencent action payload — 64 KiB still fits user-data scripts and policy documents while rejecting accidentally-pasted dumps from an LLM plan.
Four small items from the review. 1. Credentials redaction (tencent/client.go) Added String() and MarshalJSON() on Credentials so %v / %+v / Println and json.Marshal all render SecretKey as **** instead of leaking the raw key. Direct field access (the SDK signature path) is unchanged. 2. Drop doubled gather call (api/routes.go) handleTencentResources called GetRelevantContext (full multi-section gather) and discarded the result, then called gatherTencentByType for the requested type — doubling SDK calls per request. Removed the GetRelevantContext call. 3. NewClientWithCredentials factory for parity (tencent/client.go) Added BackendTencentCredentials struct and NewClientWithCredentials constructor matching the shape AWS / GCP / Fly.io / etc. already use. Not wired into the backend credential flow yet — kept for consistency so the dispatch layer can treat Tencent the same as every other provider. 4. Per-field size cap on paramsJSON (tencent/raw.go) maxParamsJSONBytes (256 KiB total) + maxParamsFieldBytes (64 KiB per string field, walked recursively into nested maps and slices). The effective cap was the 1 MiB HTTP body limit, which is far larger than any legitimate Tencent action payload — 64 KiB still fits user-data scripts and policy documents while rejecting accidentally-pasted dumps from an LLM plan.
|
Nits tier landed in
Status of the full review:
The only thing left from your notes is paginating the remaining low-cardinality gather functions (CAM, NAT, VPN, CCN, DC, etc.) — those typically have <10 items per account so they don't hit the 100-row truncation in practice, but I'm happy to do them if you'd prefer everything paginated for uniformity. Ready for another pass whenever you have a moment. Thanks again for the thorough first review. |
Cron-facing alert that walks every PREPAID-capable resource (CVM,
Lighthouse, CBS, MySQL, Postgres, Redis, MongoDB, CynosDB, CLB,
AntiDDoS, and SSL with --include-ssl) across the requested regions and
returns items at or below a renewal threshold.
• CLI: clanker tencent expiry --regions=ap-x,ap-y --threshold=14
Exit 0 = nothing flagged, 1 = items in window, 2 = already
expired. Drop-in for crontab + MAILTO, GitHub Actions, etc.
• HTTP: GET /api/v1/tencent/expiry?regions=&threshold=&manual_only=&include_ssl=
Returns the full report with counts breakdown (total / flagged
/ expired / auto_renew). Region query param is validated through
the same regex SendRaw uses, so SSRF-shaped inputs are rejected.
manual_only defaults to true so the cron only surfaces items that won't
auto-renew (the actionable subset); auto-renewers are still counted in
counts.auto_renew for visibility.
rafeegnash
left a comment
There was a problem hiding this comment.
Just walked the new revision end-to-end — really nice work, this looks great overall. Every item from the first review is properly addressed:
- Rebase is clean (base is
2728110, no K8s deletions). - Constant-time token compare — verified.
- Server refuses to start without a token (or explicit
--insecure) — built the binary and confirmed empirically. - ReDoS defense on
filter matches— pattern cap + goroutine-with-timeout on top of Go's RE2. - Region regex chokepoint with the typed
*errInvalidRegion. - CORS default narrowed to
http://localhost:4173. - Destructive allowlist (
Describe|Get|List|Query|Lookup|Search|Check|Inquiry) —AddUser/CreateAccessKey/AttachUserPolicycorrectly blocked without--destroyer. HttpProfile.ReqTimeout = 30s+ctxDonebetween pages.- Pagination on CVM/VPC/SG up to 1000 with truncation warning.
- Filter tests covering happy + sad paths + ReDoS defense.
- Credential
String()+MarshalJSON()redaction. - Doubled
GetRelevantContextcall removed. NewClientWithBackendCredentialsfactory.paramsJSONsize caps.
Also reviewed the new /api/v1/tencent/expiry endpoint from 1a23646 — it validates regions / threshold / booleans cleanly. 👍
One blocker I caught while running the built binary: a single-request crash in writeTencentClientErr from the security commit (see inline comment). Genuinely a tiny typo-class bug — one line change to fix — but since the credential-missing error path is the most common non-region failure, any prod restart without env vars set hits it. Worth a quick httptest covering "no creds → 401" to lock it in.
A few smaller items I noticed while looking, none are blockers:
internal/api/routes_maker.go:147—countDestructiveCommandsstill uses the old prefix denylist (Terminate|Delete|Destroy|Reset|Release|Discontinue) with theResetInstancesPasswordcarve-out. Now that the executor is allowlist-based, the audit record undercounts: anAddUserblocked by--destroyerrecords as 0 destructive commands. Easy to fix by calling intoisTencentDestructivefromexec_tencent.go.internal/api/middleware.go— failed bearer attempts return 401 but never reachlogMiddleware(it's nested inside auth), so 401s are silent. A one-line log in the 401 branch would help for prod observability.internal/api/server.go:28comment still says"*" by default— should reflect the newhttp://localhost:4173default.internal/maker/exec_tencent.go:124-126docstring still describes the old prefix denylist; should be updated to the allowlist.routes_plan.goreturns 200 OK + awarningfield when the LLM emits unparseable JSON — would be cleaner as a 4xx so clients don't have to inspect the body to know it failed.
If you fix the recursion + roll a quick test for it, the rest of the small stuff can either come along in the same commit or land as a tiny follow-up — totally up to you. Thanks again for the careful work on this round, the diff reads really well now.
| writeError(w, http.StatusBadRequest, "invalid_region", err.Error()) | ||
| return | ||
| } | ||
| writeTencentClientErr(w, err) |
There was a problem hiding this comment.
Tiny but critical bug from the security-tier cleanup — this branch recurses with the same args instead of returning the credential-failure response. Any non-*errInvalidRegion error (specifically the one tencent.NewClient returns when credentials are missing) hits this line and the function calls itself until the goroutine stack runs out.
Reproduced by building the PR + running clanker server --port 18080 --token x with no Tencent creds set, then curl -H 'Authorization: Bearer x' http://127.0.0.1:18080/api/v1/tencent/regions — the process dies with:
fatal error: stack overflow
github.com/bgdnvk/clanker/internal/api.writeTencentClientErr(...)
github.com/bgdnvk/clanker/internal/api.writeTencentClientErr(...)
github.com/bgdnvk/clanker/internal/api.writeTencentClientErr(...)
...
Single authenticated request → server gone. Worth catching because credential rotation in prod would hit this without anyone touching the server.
One-line fix matching the intent documented in the PR description:
func writeTencentClientErr(w http.ResponseWriter, err error) {
var ir *errInvalidRegion
if errors.As(err, &ir) {
writeError(w, http.StatusBadRequest, "invalid_region", err.Error())
return
}
writeError(w, http.StatusUnauthorized, "tencent_credentials", err.Error())
}Would also be a good place to add a tiny httptest against any one Tencent handler with no creds set — that test would have caught this in CI.
Six items from rafeegnash's second-pass review. The first is a real production-killer; the rest are correctness + hygiene. 1. writeTencentClientErr no longer infinite-recurses (api/routes.go) The catch-all branch from the security commit called itself instead of writeError — stack overflow on the very first credential-missing request, taking the process with it. New regression test in api/routes_test.go (httptest + direct helper invocation, including the wrapped-error errors.As path) locks the fix in. 2. countDestructiveCommands uses the live classifier (api/routes_maker.go) The plan audit count was still computed from the old prefix denylist, so AddUser / CreateAccessKey / AttachUserPolicy gated by --destroyer recorded as 0 destructive commands. Exported IsTencentDestructive from the maker package; routes_maker now delegates to it so the audit count never drifts away from what the executor's safety gate enforces. 3. 401s no longer silent (api/middleware.go) authMiddleware wraps logMiddleware so a rejected request short-circuits before any access log fires. Added s.log401() that records method + path + remote_addr + reason on every failed bearer attempt, so prod credential rotations or auth attacks show up in stderr. 4. routes_plan returns 422 on unparseable LLM output (api/routes_plan.go) Was 200 + 'warning' field, which forced clients to inspect the body to know it failed. 422 Unprocessable Entity is the right semantic — we understood the request but the upstream result was unprocessable. The raw cleaned text is still in data.plan so the dashboard editor can offer hand-correction. 5. Comments brought in line with code (api/server.go, maker/exec_tencent.go) Config.CORSOrigin doc said '"*" by default'; now says http://localhost:4173. validateTencentCommand docstring described the old prefix denylist; now describes the allowlist.
Six items from rafeegnash's second-pass review. The first is a real production-killer; the rest are correctness + hygiene. 1. writeTencentClientErr no longer infinite-recurses (api/routes.go) The catch-all branch from the security commit called itself instead of writeError — stack overflow on the very first credential-missing request, taking the process with it. New regression test in api/routes_test.go (httptest + direct helper invocation, including the wrapped-error errors.As path) locks the fix in. 2. countDestructiveCommands uses the live classifier (api/routes_maker.go) The plan audit count was still computed from the old prefix denylist, so AddUser / CreateAccessKey / AttachUserPolicy gated by --destroyer recorded as 0 destructive commands. Exported IsTencentDestructive from the maker package; routes_maker now delegates to it so the audit count never drifts away from what the executor's safety gate enforces. 3. 401s no longer silent (api/middleware.go) authMiddleware wraps logMiddleware so a rejected request short-circuits before any access log fires. Added s.log401() that records method + path + remote_addr + reason on every failed bearer attempt, so prod credential rotations or auth attacks show up in stderr. 4. routes_plan returns 422 on unparseable LLM output (api/routes_plan.go) Was 200 + 'warning' field, which forced clients to inspect the body to know it failed. 422 Unprocessable Entity is the right semantic — we understood the request but the upstream result was unprocessable. The raw cleaned text is still in data.plan so the dashboard editor can offer hand-correction. 5. Comments brought in line with code (api/server.go, maker/exec_tencent.go) Config.CORSOrigin doc said '"*" by default'; now says http://localhost:4173. validateTencentCommand docstring described the old prefix denylist; now describes the allowlist.
|
Round-2 fixes landed in Recursion fix + regression test (
401 logging (
Stale comments updated
Let me know if you'd like the destructive-classifier moved to a shared location (it currently lives in |
|
Merged 🎉 Thank you for the careful work on this — really appreciate how thoroughly you walked through every item across the three review rounds. Recursion fix verified empirically against the built binary (same trigger that previously stack-overflowed now returns a clean 401 + JSON envelope, server stays up, 401 appears in the logs with method / URI / remote / reason). All five other items also addressed cleanly, and the new Tencent provider is now in master alongside the other clouds. Welcome aboard, and thanks again — this is great work to start from. 🙌 |
Upstream merged PR bgdnvk#165 (Tencent provider) and added work on top: k8s SRE playbooks (bgdnvk#174), SRE agent fix (bgdnvk#177), tree-wide gofmt -s (bgdnvk#176), README (bgdnvk#175), and three Tencent CLI features the fork lacked — `list --format json` (bgdnvk#179), `cost --format json` (bgdnvk#180), and security-scan CLI subcommands (bgdnvk#181). Conflict resolution: all 16 conflicts resolved to upstream's side. 14 were pure gofmt whitespace from bgdnvk#176 (identical code); billing.go and static_commands.go were upstream supersets adding the JSON/security CLI surface with no fork-unique code lost. Fixed a duplicate tencent import in cmd/ask.go left by the auto-merge. Verified in Docker (golang:1.25, -mod=mod): gofmt clean, go build ./..., go vet ./..., and go test ./... all pass.
Summary
Adds a Tencent Cloud provider alongside the existing AWS / GCP / Azure / Cloudflare / Fly / Verda / Vercel / Railway providers. Built over the past month in 15 phases on my fork; this PR consolidates the work as one contribution.
Opened as Draft so you can react before committing review time. Happy to split into multiple PRs (provider core / HTTP API / Maker integration) if that's preferred.
Coverage
cvm+lighthouse(lightweight cloud server)vpc,subnet,security-group+ rule audit,eip,clb,nat,vpn,ccn,direct-connectcbs(Cloud Block Storage),cos(Object Storage)mysql(cdb),postgres,redis,mongodb,cynosdb(tdsql-c)cdn,edgeone,waf,anti-ddoscamusersSecurity audits (10)
public-cvm-exposure,clb-exposure,db-exposure,idle-eips,unencrypted-cbs,cert-expiry,cam-hygiene,waf-coverage,antiddos-coverage,audit-coverage.Maker integration
tencent-apiverb — 5-arg[tencent-api, service, action, region, json-params]dispatches via Tencent SDK'sCommonRequestover signed transport. Notcclidependency.filterverb (new) —[filter, sourceIdx, arrayPath, field, op, value]post-processor. Operators:> < >= <= == != contains startsWith matches. Lets Maker answer "find X by criteria" queries directly instead of returning full inventory.[*]array placeholders —jsonPathStringininternal/maker/exec.gonow handles$.X[*].Ywildcard paths, binding the placeholder to a JSON array literal so chains like"InstanceIds":<CVM_IDS>work.knownHallucinatedActionscatches common LLM-invented action names with "did you mean..." hints before round-tripping to Tencent.HTTP API
Hooks into the existing
clanker serverroute table with bearer-auth endpoints under/api/v1/tencent/*(inventory, scans, monitoring, cost, topology). Maker plan/apply share the existing endpoints.Credentials
Resolved in this order:
tencent.{secret_id,secret_key,region}TENCENTCLOUD_SECRET_{ID,KEY}+TENCENTCLOUD_REGION(official Tencent SDK env names)TENCENT_SECRET_{ID,KEY}+TENCENT_REGION(short aliases)Default region:
ap-singapore.Stats
50 files changed, 9112 insertions(+), 4 deletions(-)— almost entirely additive. Modifications are limited to:internal/maker/exec.go([*]helper),internal/maker/exec_tencent_filter.go(new filter verb used by other providers too), and registration incmd/{ask,root,server}.go.Test plan
clanker tencent cvm --region ap-singaporewith valid credsGET /api/v1/tencent/resources/cvm?region=ap-singaporereturns inventoryclanker ask --maker "list CVMs with >2 vCPUs in ap-jakarta"produces a 2-cmd plan (DescribeInstances + filter)clanker tencent scan public-exposure --region ap-singapore