feat(tencent): add Tencent Cloud provider by rephapeng · Pull Request #165 · bgdnvk/clanker

rephapeng · 2026-05-18T13:31:30Z

Summary

Adds a Tencent Cloud provider alongside the existing AWS / GCP / Azure / Cloudflare / Fly / Verda / Vercel / Railway providers. Built over the past month in 15 phases on my fork; this PR consolidates the work as one contribution.

Opened as Draft so you can react before committing review time. Happy to split into multiple PRs (provider core / HTTP API / Maker integration) if that's preferred.

Coverage

Compute — cvm + lighthouse (lightweight cloud server)
Network — vpc, subnet, security-group + rule audit, eip, clb, nat, vpn, ccn, direct-connect
Storage — cbs (Cloud Block Storage), cos (Object Storage)
Database — mysql (cdb), postgres, redis, mongodb, cynosdb (tdsql-c)
Container — TKE clusters + kubeconfig fetch
Edge — cdn, edgeone, waf, anti-ddos
Identity — cam users
Observability — Cloud Monitor metrics (CVM + Lighthouse), CLS log topics, Cloud Audit tracks, alarm policies
Cost — monthly billing by product + top-N resources
Tags — surfaced on summary structs via a reflection helper that handles the SDK's inconsistent tag-field names (Tag{Key,Value}, ResourceTag{TagKey,TagValue}, TagSet, TagList, ...)

Security audits (10)

public-cvm-exposure, clb-exposure, db-exposure, idle-eips, unencrypted-cbs, cert-expiry, cam-hygiene, waf-coverage, antiddos-coverage, audit-coverage.

Maker integration

tencent-api verb — 5-arg [tencent-api, service, action, region, json-params] dispatches via Tencent SDK's CommonRequest over signed transport. No tccli dependency.
filter verb (new) — [filter, sourceIdx, arrayPath, field, op, value] post-processor. Operators: > < >= <= == != contains startsWith matches. Lets Maker answer "find X by criteria" queries directly instead of returning full inventory.
[*] array placeholders — jsonPathString in internal/maker/exec.go now handles $.X[*].Y wildcard paths, binding the placeholder to a JSON array literal so chains like "InstanceIds":<CVM_IDS> work.
Action denylist — knownHallucinatedActions catches common LLM-invented action names with "did you mean..." hints before round-tripping to Tencent.
Prompt — chain shapes A–H, anti-patterns, static-spec vs runtime metrics rules.

HTTP API

Hooks into the existing clanker server route table with bearer-auth endpoints under /api/v1/tencent/* (inventory, scans, monitoring, cost, topology). Maker plan/apply share the existing endpoints.

Credentials

Resolved in this order:

viper tencent.{secret_id,secret_key,region}
TENCENTCLOUD_SECRET_{ID,KEY} + TENCENTCLOUD_REGION (official Tencent SDK env names)
TENCENT_SECRET_{ID,KEY} + TENCENT_REGION (short aliases)

Default region: ap-singapore.

Stats

50 files changed, 9112 insertions(+), 4 deletions(-) — almost entirely additive. Modifications are limited to: internal/maker/exec.go ([*] helper), internal/maker/exec_tencent_filter.go (new filter verb used by other providers too), and registration in cmd/{ask,root,server}.go.

Test plan

Build green on Go 1.25
Smoke: clanker tencent cvm --region ap-singapore with valid creds
HTTP: GET /api/v1/tencent/resources/cvm?region=ap-singapore returns inventory
Maker: clanker ask --maker "list CVMs with >2 vCPUs in ap-jakarta" produces a 2-cmd plan (DescribeInstances + filter)
Security scan: clanker tencent scan public-exposure --region ap-singapore

bgdnvk · 2026-05-23T06:35:00Z

Thanks for the PR, will review soon :)

rafeegnash

Thanks for the huge body of work here — 10K lines across 15 phases is real engineering, and the Tencent provider integration is in great shape overall. The lazy per-service SDK clients, the tencent-api + filter maker verbs, the knownHallucinatedActions map, the credential resolution chain — all thoughtful and a clean fit with the codebase's conventions.

We'd love to take you up on your offer to split into smaller PRs — at this scale, three focused PRs would make this much easier to review carefully and give you faster turnaround on the parts that are already solid:

Provider core — internal/tencent/* (the SDK client, per-service modules, security scans, Cobra commands, credential resolution). This is the strongest piece and could land first with minimal back-and-forth.
Maker integration — internal/maker/exec_tencent*.go, tencent_prompts.go, the [*] placeholder change in exec.go. Most of our review notes are here.
HTTP API — internal/api/* plus the cmd/server.go wiring. This is the layer that needs the most careful auth hardening (see notes below).

Totally fine if splitting isn't realistic — we can keep reviewing this PR as-is, just wanted to flag that the offer is welcome.

A few things to flag before re-review:

Branch is out of date

The merge base is 3e65a24 (feat: mcp #164, from late April). Since then master has merged #169–#174 (k8s helm, exec/apply/port-forward, node lifecycle, create aks, native MCP K8s tools, conversation history cap, SRE playbook framework). The diff currently shows those files as deletions — that's purely a rebase-conflict artifact, not anything you intended. When you get a chance, could you pull master and rebase? The conflict surface should mostly be cmd/root.go and cmd/ask.go where the new K8s subcommands register alongside what you've added.

Security notes on the HTTP API

These are the items we'd want fixed before the HTTP API piece lands (whether as part of this PR or in the split):

internal/api/middleware.go:43 — token comparison should use crypto/subtle.ConstantTimeCompare to avoid timing-attack token recovery. Two-line fix, see inline comment.
internal/api/server.go:67-69 — empty --token currently disables auth on every route including POST /api/v1/maker/apply. It only logs a warning, and the first help-text example shows running with no token, which feels like a footgun. Could we either refuse to start without a token (unless --insecure is explicit), or at least gate POST /maker/apply when the token is unset?
internal/maker/exec_tencent_filter.go:182 — regexp.Compile(value) on user-supplied patterns has a ReDoS risk ((a+)+$ style patterns can spin a goroutine). A pattern-length cap + per-regex deadline (or regexp2 with timeout) would close this.

High-severity items worth addressing

internal/api/routes.go:273 — the ?region= query param goes verbatim into the SDK client. Pinned to *.tencentcloudapi.com so not full SSRF, but enables enumeration and potentially unintended API charges. A simple regex validation (^ap-[a-z]+-[0-9]*$) or matching against ListAllRegions would tighten this.
internal/api/server.go:48 — CORSOrigin defaults to *. Bearer auth via header mitigates CSRF, but a http://localhost default with explicit override feels safer.
internal/tencent/raw.go — the destructive denylist is prefix-based (Terminate|Delete|Destroy|Reset|Release|Discontinue), so CAM actions like AddUser, CreateAccessKey, AttachUserPolicy slip through without needing --destroyer. If LLM prompts could ever be influenced by fetched data, this widens the blast radius. Would you be open to an allowlist of read-only actions for non-destroyer mode?

Should-fix before merge

internal/tencent/context.go — ctx is accepted on the gather functions but the Tencent SDK has no WithContext variants, so the context is effectively dead. Ctrl-C during clanker ask --tencent won't interrupt anything in-flight. Either a comment explaining the limitation or wiring HttpProfile.ReqTimeout per-call would help.
internal/tencent/context.go — GetRelevantContext hardcodes Limit = 100 and silently truncates. Accounts with more than 100 of any resource type send incomplete data to the LLM. The CLI list commands paginate correctly — could the context-gather layer do the same, or at least log (showing first 100) like the AWS path does?
internal/maker/exec_tencent_filter.go — validateFilterCommand and filterMatch are non-trivial decision-tree code with no tests. A table-driven test would help guard against regressions.

Nits / parity notes

internal/tencent/client.go lacks a NewClientWithBackendCredentials factory like other providers have — fine for now since Tencent isn't wired into the backend credential flow yet, but worth adding for consistency.
internal/api/routes.go:104-128 calls GetRelevantContext, discards the result, then re-calls gatherTencentByType — doubled API calls per request.
Credentials struct (client.go:18) has no String() / MarshalJSON redaction. AWS has the same gap, but one %+v away from leaking SecretKey.
internal/tencent/raw.go:80 — paramsJSON is unmarshalled with no per-field size cap. Effective cap is the 1 MiB body limit; 64 KiB would be plenty.
No MCP tools for Tencent — looks intentional for this PR's scope; mentioning so reviewers don't wonder.

What we really like

Lazy per-service SDK clients match the Tencent SDK's design — clearly justified divergence from AWS's eager init.
Generated files have the correct Code generated by ...; DO NOT EDIT. headers and the generator uses //go:build ignore.
The reflection-based tag extractor in tags.go is bounded — no recursion, no panic risk on unusual SDK shapes.
The --destroyer gate follows the existing maker convention.
Security audits use only describe APIs — no port scanning of discovered endpoints.
knownHallucinatedActions with "did you mean" hints is a nice UX touch for LLM-typo'd actions.

Happy to re-review piece by piece as the splits land (or this PR after a rebase), whichever is easier for you. Thanks again for taking this on.

rafeegnash · 2026-05-24T13:31:22Z

+		}
+		auth := r.Header.Get("Authorization")
+		const prefix = "Bearer "
+		if !strings.HasPrefix(auth, prefix) || strings.TrimSpace(auth[len(prefix):]) != s.cfg.Token {


Critical — non-constant-time token comparison.

This is a timing side-channel an attacker can use to recover the token byte-by-byte by measuring response times across many requests. Practical over LAN, sometimes even over the open internet.

Suggested fix:

import "crypto/subtle" // ... provided := []byte(strings.TrimSpace(auth[len(prefix):])) expected := []byte(s.cfg.Token) if subtle.ConstantTimeCompare(provided, expected) != 1 { writeError(w, http.StatusUnauthorized, "unauthorized", "missing or invalid bearer token") return }

(ConstantTimeCompare returns 0 if the lengths differ as well, so this also covers the length-equal-but-content-mismatch case.)

rafeegnash · 2026-05-24T13:31:23Z

+// Run starts the HTTP server and blocks until ctx is cancelled or
+// ListenAndServe returns an error.
+func (s *Server) Run(ctx context.Context) error {
+	if strings.TrimSpace(s.cfg.Token) == "" {


Critical — empty token disables auth on every route, including POST /maker/apply.

With cfg.Token == "" the middleware short-circuits and lets every request through. That includes plan execution against live Tencent credentials. The first example in the help text shows clanker server --port 8080 with no token, so this is what operators will run by default in dev — and a malicious page loaded on the same machine could then call POST /maker/apply cross-origin (CORS defaults to *).

Could we either:

Refuse to start without a token unless an explicit --insecure flag is passed, or

At minimum, block POST /maker/apply (and any other mutating route) when the token is unset, even if read-only routes stay open?

The one-line WARNING log is easy to miss in a busy terminal.

rafeegnash · 2026-05-24T13:31:23Z

+		if !ok {
+			return false
+		}
+		re, err := regexp.Compile(value)


Critical — ReDoS via user-supplied regex.

value comes from the plan JSON, which in the HTTP path comes from the POST /api/v1/maker/apply body. A pattern like (a+)+$ against a long string will spin a goroutine that context.Done() cannot preempt — Go's regexp package doesn't support deadline-based cancellation.

Options:

Pattern-length cap + complexity heuristic — reject patterns over ~256 chars or containing nested quantifiers ((a+)+, (a*)*, (a|a)*).

regexp2 package — drop-in replacement with MatchTimeout support.

Pre-validate via regexp/syntax.Parse and walk the AST rejecting unbounded backtracking.

Option 1 is the smallest change. Option 2 is most robust if you're OK adding a dep.

At minimum, bound the string length being matched against — if s is a huge SDK response field, the worst-case input grows.

rafeegnash · 2026-05-24T13:31:23Z

+// tencentClient builds a Tencent client for this request. The region can be
+// overridden per request via ?region=ap-jakarta; otherwise the daemon's
+// default (resolved from config / env at startup) is used.
+func (s *Server) tencentClient(r *http.Request) (*tencent.Client, error) {


High — ?region= query param accepted without validation.

The value is passed verbatim into the Tencent SDK as the region, which becomes part of the request signing and the endpoint hostname. The hostname is pinned to *.tencentcloudapi.com by the SDK so it's not arbitrary SSRF, but a caller can:

Enumerate which regions the credential has access to via differential responses.

Cause unintended API charges by looping through bogus region strings.

Validate against ^[a-z]+-[a-z]+(-[0-9]+)?$ or check membership in ListAllRegions(). Cheap to add.

rafeegnash · 2026-05-24T13:31:23Z

+		cfg.Addr = ":8080"
+	}
+	if cfg.CORSOrigin == "" {
+		cfg.CORSOrigin = "*"


Medium — CORS default is *.

Bearer auth via Authorization header mitigates classic CSRF (browsers won't auto-send), but a wildcard default still feels too permissive — any origin can read responses on a successful auth, and combined with the empty-token case above, an unset token on a clanker server instance is reachable from any web page.

Suggest defaulting to http://localhost and requiring an explicit --cors-origin to relax it.

rafeegnash · 2026-05-24T13:31:23Z

@@ -0,0 +1,1467 @@
+package tencent


Should-fix — ctx parameter is accepted but never propagated to the Tencent SDK.

The SDK's generated clients don't expose WithContext variants, so passing ctx here is purely cosmetic — a Ctrl-C during clanker ask --tencent won't interrupt anything in flight. The only function that respects the context is contextCOS via its own context.WithTimeout (which would still wait on the SDK call to return).

Two options:

Document the limitation at the top of this file so the next person doesn't expect cancellation to work.

Wire HttpProfile.ReqTimeout on each per-service profile.NewClientProfile() so calls at least have a hard ceiling.

Option 2 is the safer choice — without it a hung Tencent endpoint blocks the gather goroutine indefinitely.

rafeegnash · 2026-05-24T13:31:23Z

+	return out.String(), nil
+}
+
+// contextCVMs returns a compact JSON array of CVMs in this client's region.


Should-fix — silent truncation at Limit = 100.

The gather functions cap at 100 items per resource type and don't paginate. An account with 150 CVMs sees only the first 100 in the LLM's context, with no signal that more exist. The CLI list commands (e.g. internal/tencent/cvm.go) paginate correctly with an offset loop, so the pattern exists in the codebase.

At minimum, surface the truncation: AWS does this with (showing first %d roles) in the error/info path. Ideally, paginate fully (or to a higher cap like 500) so the LLM gets the full picture for at least small-to-medium accounts.

rafeegnash · 2026-05-24T13:31:23Z

+const defaultRegion = "ap-singapore"
+
+// Credentials holds the resolved Tencent Cloud credentials and target region.
+type Credentials struct {


Nit — credentials struct has no redaction.

No String() / MarshalJSON method, so any %+v or %v formatting of Credentials (or any struct that embeds it, like Client) leaks SecretKey into logs. No current call site does this, but it's one careless debug line away. AWS has the same gap, so this is parity — but worth tightening on both sides:

func (c Credentials) String() string { return fmt.Sprintf("Credentials{SecretID: %q, Region: %q, SecretKey: <redacted>}", c.SecretID, c.Region) }

rafeegnash · 2026-05-24T13:31:23Z

+			"data":    []interface{}{},
+			"warning": err.Error(),
+		})
+		_ = raw


Nit — GetRelevantContext result is discarded then gatherTencentByType is called separately, doubling the API calls.

The raw variable above is assigned and then the code path just continues to gatherTencentByType(...), which makes the same SDK calls a second time. Either consume raw here or drop the GetRelevantContext call entirely from this handler.

rafeegnash · 2026-05-24T13:31:23Z

+// validateTencentCommand rejects anything that isn't a well-formed tencent-api
+// call. Destructive actions (Terminate*, Delete*, Reset*) are gated behind
+// --destroyer to match the policy applied to every other provider.
+func validateTencentCommand(args []string, allowDestructive bool) error {


Should-fix — no test coverage for validateTencentCommand or validateFilterCommand.

This is a security-relevant validator (rejects newlines, gates destructive actions, enforces arg-count bounds). A table-driven test covering each error path — empty command, wrong verb, missing service/action/region, newline in args, destructive without --destroyer, oversized arg count — would help prevent regressions. The same applies to filterMatch in exec_tencent_filter.go.

rephapeng · 2026-05-24T13:50:03Z

cool thank you for all comment and fixing recommendation I will check it directly

Adds a Tencent Cloud provider to the Clanker CLI alongside the existing AWS/GCP/Azure/Cloudflare/Fly/Verda/Vercel/Railway providers. Built up in 15 phases over the past month against the bgdnvk/clanker upstream; this PR consolidates the full provider as one contribution. Coverage * Compute cvm + lighthouse (lightweight cloud server) * Network vpc, subnet, security-group + rule audit, eip, clb, nat, vpn, ccn, direct-connect * Storage cbs (Cloud Block Storage), cos (Object Storage) * Database mysql (cdb), postgres, redis, mongodb, cynosdb (tdsql-c) * Container tke clusters + kubeconfig fetch * Edge cdn, edgeone, waf, anti-ddos * Identity cam users * Observability cloud monitor metrics (CVM + Lighthouse), cls log topics, cloud audit tracks, alarm policies * Cost monthly billing by product + top-N resources * Tags flat map[string]string surfaced on all summary structs that the SDK returns tags for (CVM, Lighthouse, VPC, Postgres); reflection-based helper handles the SDK's inconsistent tag-field naming across services. Security audits * public-cvm-exposure CVMs with sensitive ports open to 0.0.0.0/0 * clb-exposure public CLBs with risky listeners * db-exposure managed DBs reachable from the public internet * idle-eips EIPs unbound but billed * unencrypted-cbs CBS volumes without encryption * cert-expiry SSL certs expiring within N days * cam-hygiene CAM users missing phone/email * waf-coverage CDN/EdgeOne hosts not covered by WAF * antiddos-coverage account anti-DDoS posture + per-region targets * audit-coverage Cloud Audit tracks status HTTP API Hooks into the existing `clanker server` route table with bearer-auth endpoints for inventory (/api/v1/tencent/resources/{type}), scans (/api/v1/tencent/scan/{kind}), monitoring (/api/v1/tencent/metrics/ {cvm|lighthouse}), cost (/api/v1/tencent/cost/by-product, /resources), and topology (/api/v1/tencent/topology). Maker plan and apply share the existing /api/v1/maker/{plan,apply} endpoints. Maker integration * tencent-api verb 5-arg form [tencent-api, service, action, region, json-params] dispatches to a generic SendRaw over Tencent's CommonRequest signed transport. No tccli dependency. * tencent_prompts.go planner system prompt with chain shapes A-H, anti-patterns, static-spec vs runtime metrics rules, and the bgdnvk-style filter verb example. * filter verb new [filter, sourceIdx, arrayPath, field, op, value] post-processor returns the matching subset of a prior command's output. Operators: > < >= <= == != contains startsWith matches. Lets Maker answer "find X by criteria" queries directly instead of dumping full inventory. * [*] array placeholders jsonPathString in internal/maker/exec.go now handles $.X[*].Y wildcard paths, binding to a JSON array literal so "InstanceIds":<CVM_IDS> chains work. * Action denylist knownHallucinatedActions catches common LLM-invented Tencent action names with "did you mean..." hints before the round- trip (GetProductMetricData -> GetMonitorData, cvm.ListInstances -> DescribeInstances, etc). Credentials Reads in this order: viper tencent.{secret_id,secret_key,region}, then TENCENTCLOUD_SECRET_* (official Tencent SDK env names), then TENCENT_SECRET_* short aliases. Default region ap-singapore. Dependencies * tencentcloud/tencentcloud-sdk-go/{cvm,vpc,cbs,clb,cdb,postgres, redis,mongodb,tke,tag,cam,monitor,cls,billing,lighthouse,...} The provider is fully read-by-default; write operations (Create/Modify/Run) go through Maker's existing plan+apply gate, with destructive Terminate/Delete/Reset/Release/Discontinue actions additionally requiring the existing --destroyer flag.

Tencent's two monitor APIs disagree about the canonical dimension casing for namespace QCE/LIGHTHOUSE: - DescribeBaseMetrics (metadata) reports "instanceid" (lowercase) - GetMonitorData (data) accepts "InstanceId" (PascalCase) We trusted DescribeBaseMetrics when building lighthouse.go, which is why every Lighthouse metric call has been returning Tencent's misleading "[InvalidParameterValue] : unauthorized operation or the instance has been destroyed" — same error code Tencent reuses for genuine permission gaps and lifecycle issues, which sent the debug down two days of false leads (CAM, agent install, account type, sub-user vs root, ...). The user's CAM is genuinely AdministratorAccess. The Cloud Monitor agent is installed. The Tencent Console displays the metrics fine. The data was always there — Tencent's data API just rejected our spelling of the dimension name with a wildly inappropriate error code. PascalCase is the same form CVM uses, so the fix is a one-character change to the lighthouseDimensionKey constant. Added a comment block explaining the discrepancy so the next person who reads this code doesn't repeat the investigation. Verified live against lhins-fprj6w5h (ap-singapore): CpuUsage 0.80% (avg 0.96%, 59 samples) MemUsage 38.79% (avg 38.72%, 59 samples) DiskUsage 22.18% (60 samples) LighthouseOutpkg 1 (avg 1.22, 59 samples)

DescribeBillSummaryByProduct (the per-product cost call) returns RealCost but no tax field, so Clanker cost totals never matched the Tencent console tax-inclusive headline. RealCost is total consumption (voucher + cash + tax); the console headline is cash out of pocket. Adds billFeeSummary() — calls DescribeCostExplorerSummary with Dimensions=feeType, FeeType=cost, the only billing API that breaks out tax. BillByProductJSON now embeds a summary object: consumption total RealCost (voucher + cash + tax) voucher amount covered by vouchers cash_before_tax cash portion, pre-tax tax tax amount cash_incl_tax cash_before_tax + tax (matches console headline) The Detail item names are localized display strings; the cash line is "Total Amount After Discount (Excluding Tax)" which contains the word "tax", so the substring match checks "discount" before "tax" to avoid misclassifying the cash line as tax. Verified against a real April 2026 bill: consumption 11,146.37 voucher 5,701.68 cash_before_tax 4,905.13 tax 539.56 cash_incl_tax 5,444.70 = console "Total Cost (Incl Tax)"

- DescribeVoucherInfo / DescribeVoucherUsageDetails: voucher inventory, balances, per-voucher usage history, and a per-owner-UIN breakdown of voucher spend (nominal - balance). Voucher APIs only answer on the account's home region, so they use a region-aware billing client. - VoucherByOwnerJSON: month-scoped voucher deduction grouped by the owner account UIN of each billed resource, via DescribeBillResourceSummary (the voucher APIs carry no per-record UIN). - CLI: `clanker tencent cost vouchers` / `cost voucher-usage`. - HTTP API: /cost/vouchers, /cost/voucher-usage/{id}, /cost/voucher-by-owner.

Extend the slim JSON shape returned by /api/v1/tencent/resources/* for every subscription-capable resource: CVM, Lighthouse, CBS, MySQL, Postgres, Redis, MongoDB, CynosDB, CLB, AntiDDoS. PREPAID entries now carry the renewal deadline so callers can see what is about to expire without a separate billing call. Tencent uses two billing-mode conventions (string vs int) and the int form has NO consistent mapping across services — CDB inverts vs the others. internal/tencent/charge_mode.go centralizes the normalization into 'PREPAID' / 'POSTPAID' strings so the JSON shape is uniform.

The maker plan executor (SendRaw) gated calls behind a hand-maintained service map in raw.go, which silently lagged behind the SDK — calling lighthouse.DescribeInstances failed with 'unsupported tencent service' even though every other code path (typed clients, dashboard) handled it fine. Eight other services (antiddos, billing, cdn, cloudaudit, cynosdb, dc, ssl, teo, waf) were also missing from the map. Replace the static map with a go:generate-driven one. gen_services.go walks GOMODCACHE/.../tencentcloud-sdk-go/tencentcloud/*/v* and emits service_versions_gen.go with one entry per service (latest version when multiple are vendored). The cos sentinel is preserved via a manualOverrides map in the generator. Upgrading the SDK now just needs 'go generate ./internal/tencent/...'. The error message in SendRaw now enumerates services from the generated map so it stays accurate.

Add auto_renew (*bool, omitempty) to the slim JSON returned by /api/v1/tencent/resources/* for every prepaid-capable resource where the SDK exposes the renew flag: CVM, Lighthouse, CBS, MySQL, Postgres, Redis, CynosDB, and CLB. Consumers can now check whether an expiring resource will auto-renew or needs manual action — without the field, expiring auto-renewers would generate false-positive alerts. Skipped MongoDB and AntiDDoS — their list endpoints don't expose the renew flag (would require a separate DescribeAutoRenew-style call). Tencent's renew encoding has three flavors handled by new normalizers in charge_mode.go: string 'NOTIFY_AND_AUTO_RENEW' (CVM/CBS/Lighthouse) and 'AUTO_RENEW' (CLB nested in PrepaidAttributes), int64 1 (CDB/Redis/ CynosDB), uint64 1 (Postgres). Using *bool + omitempty so consumers can distinguish 'not on auto-renew' from 'no info available'.

Three critical findings from rafeegnash's review. 1. Constant-time token comparison (api/middleware.go) The previous '!=' compare leaked information through response timing, letting an attacker recover the bearer token byte-by-byte by measuring latency across many requests. Switch to crypto/subtle.ConstantTimeCompare. 2. Refuse to start without a token (api/server.go, cmd/server.go) An empty --token previously disabled auth on every route, including POST /api/v1/maker/apply which can mutate real cloud resources. The server now aborts startup unless --insecure is explicit (or CLANKER_API_TOKEN is set). Help text now shows the token-gated invocation as the default example. 3. Bound the filter 'matches' operator (maker/exec_tencent_filter.go) regexp.Compile took an unbounded user-controlled pattern via the maker plan. Go's RE2 is linear so a true ReDoS is not realistic, but we now cap pattern length at 256 chars and run MatchString under a 100ms wall-clock deadline as defense-in-depth — the filter value originates from LLM output and reaches us through the HTTP API, so tighter bounds keep that surface predictable.

rephapeng · 2026-05-24T14:19:56Z

Thanks again @rafeegnash for the thorough review. First pass landed — 1bbb914.

Rebase

Branch is now rebased onto current upstream/master (d4d94f9). The false-deletion diff from the stale 3e65a24 base is gone. Conflicts didn't materialize in the end — cmd/root.go / cmd/ask.go reconciled cleanly with the new k8s subcommands.

Three critical security findings — all fixed

internal/api/middleware.go:43 — now crypto/subtle.ConstantTimeCompare with a length check first. Comment explains why leaking length is acceptable here (token length is server-startup config, never user-controlled).
internal/api/server.go:67 — server now refuses to start without a token. Added --insecure flag for the trusted-network/localhost case; help text now leads with the token-gated example instead of the open one.
internal/maker/exec_tencent_filter.go:182 — pattern length capped at 256, MatchString runs under a 100 ms wall-clock deadline via goroutine+select. Inline comment notes Go's RE2 makes a true ReDoS unrealistic but bounds the surface for any future engine swap.

Next

Working through the High / Should-fix items now (region validation, CORS default, destructive-action allowlist, ctx propagation, pagination, filter tests). Will push those as a separate commit so re-review can split.

On the PR split — let me get the security pass done first, then I'll come back to you on whether to split. Some of the High-severity items overlap maker + API so the boundary isn't as clean as it looked initially.

Three high-severity items from rafeegnash's review. 1. Validate ?region= before it reaches the Tencent SDK (api/routes.go) Previously any string was passed verbatim into the client's region — enabling enumeration of arbitrary Tencent endpoints and potentially driving unintended API charges. Added a regex covering all current region prefixes (ap|na|eu|sa|cn)-name(-suffix)?, with a typed *errInvalidRegion so the handler chokepoint surfaces 400 instead of the catch-all 401. 23 handler call sites switched to a small writeTencentClientErr helper. 2. CORS default no longer wildcard (api/server.go, cmd/server.go) Default Access-Control-Allow-Origin was "*", letting any page read API responses. Bearer auth via header mitigates CSRF but a hostile origin could still siphon data when a user pastes their token there. Default is now http://localhost:4173 (the bundled dashboard); pass --cors-origin explicitly for non-localhost deployments. 3. Switch destructive check to read-only allowlist (maker/exec_tencent.go) isTencentDestructive previously prefix-matched only Terminate|Delete| Destroy|Reset|Release|Discontinue. CAM mutations like AddUser, CreateAccessKey, AttachUserPolicy slipped through without --destroyer. Flipped to an allowlist of read-only verb prefixes (Describe, Get, List, Query, Lookup, Search, Check, Inquiry). Anything else now requires --destroyer — fail-safe by default. Behavior change to be aware of: verbs that don't match a read prefix (Create*, Run*, Add*, Modify*, Set*, Enable*, Bind*, Associate*, Allocate*, etc.) now require --destroyer. ResetInstancesPassword was previously whitelisted as 'only changes the password' — that whitelist is gone because a password reset locks out anyone using the previous credential, which is a security-affecting mutation.

rephapeng · 2026-05-24T15:25:12Z

High-severity tier landed in 2a18f7b on top of the security commit.

?region= validation (internal/api/routes.go)

Regex ^(ap|na|eu|sa|cn)-[a-z]+(-[a-z]+)?$ — covers all current region prefixes including 3-segment ones like ap-shanghai-fsi. Region NAMES never contain digits (those are zone-level), so numeric segments are correctly rejected.
New typed *errInvalidRegion; chokepoint helper writeTencentClientErr returns 400 + invalid_region for validation failures vs the existing 401 + tencent_credentials for credential failures. 23 handler call sites switched over.
Verified: SQLi-style strings, unknown prefixes, and zone-with-digits all return 400 with a clear message; valid regions (ap-jakarta, ap-singapore, ap-shanghai-fsi) return 200.

CORS default (internal/api/server.go, cmd/server.go)

Default Access-Control-Allow-Origin is now http://localhost:4173 (the bundled dashboard). --cors-origin flag remains for explicit override; help text updated to discourage *.

Destructive denylist → read-only allowlist (internal/maker/exec_tencent.go)

Allowlisted prefixes: Describe, Get, List, Query, Lookup, Search, Check, Inquiry. Anything else now requires --destroyer.
Verified end-to-end through clanker_tencent_maker_apply: all three CAM cases you flagged (AddUser, CreateAccessKey, AttachUserPolicy) are now blocked without --destroyer. Reads (DescribeInstances, GetMonitorData, ListUsers, SearchLog) still pass. Previously-allowed mutations (RunInstances, ModifyVpcAttribute, AssociateAddress) now also need --destroyer.
Removed the ResetInstancesPassword whitelist — locking out the previous credential is a security-affecting mutation, so it now needs --destroyer too.

Next batch (Should-fix tier) will be: ctx propagation, pagination beyond 100, and tests for validateFilterCommand / filterMatch.

Three items from rafeegnash's review. 1. ctx propagation + per-request timeout (tencent/profile.go, all clients) The Tencent SDK has no WithContext variants — caller ctx cancellation cannot interrupt a request in flight. As a defense, every typed client now flows through newClientProfile(endpoint), which sets HttpProfile.ReqTimeout = 30s. Combined with a ctxDone() check between pagination pages and between GetRelevantContext sections, Ctrl-C now bounds the wall-clock cost of cancellation to the single in-flight SDK call (was: indefinite). All 22 profile.NewClientProfile() + Endpoint sites were converted to the helper; the now-unused profile import was stripped from those files. 2. Paginate GetRelevantContext past 100 (tencent/context.go) contextCVMs, contextVPCs, contextSecurityGroups now loop through pages with offset/limit until TotalCount is exhausted, capped at gatherMaxItems (1000) with a logGatherTruncated() warning when the cap fires. These are the highest-cardinality types — production accounts commonly cross 100 here. Other gather functions still single-call at limit=100; bringing them up is mechanical follow-up. 3. Tests for filter validator and matcher (maker/exec_tencent_filter_test.go) Table-driven coverage for validateFilterCommand (arg count, sourceIdx, op enum) and filterMatch (every operator + every JSON value type). Includes the ReDoS-defense cases added in the critical-tier commit: oversize pattern returns false, malformed regex returns false, PCRE- style catastrophic-backtrack patterns don't hang.

rephapeng · 2026-05-25T01:50:02Z

Should-fix tier landed in e5530b3.

ctx + per-request timeout (internal/tencent/profile.go, all clients)
The SDK exposes no WithContext variants so caller ctx cancellation cannot interrupt a request in flight — added a newClientProfile(endpoint) helper that sets HttpProfile.ReqTimeout = 30s and refactored all 22 profile-constructor sites to use it. Also added a ctxDone(ctx) check between pagination pages and between GetRelevantContext sections so Ctrl-C bounds wall-clock cost to one in-flight SDK call instead of indefinite.

Pagination (internal/tencent/context.go)
contextCVMs, contextVPCs, contextSecurityGroups now loop with offset/limit until TotalCount is reached, capped at gatherMaxItems = 1000 with logGatherTruncated() when the cap fires. These were the highest-cardinality types — production accounts cross 100 here most often. The remaining gather functions still single-call at limit=100; bringing them up is mechanical follow-up and I held back to keep this commit reviewable.

Filter tests (internal/maker/exec_tencent_filter_test.go)
Table-driven coverage for validateFilterCommand (arg count, sourceIdx, op enum — both happy paths and each rejection reason) and filterMatch (every operator × representative JSON value types). Also locks in the ReDoS-defense behavior added in the critical-tier commit: oversize patterns return false, uncompilable regexes return false, PCRE-style catastrophic-backtrack patterns don't hang. go test ./internal/maker/ passes.

Remaining from the review:

The nits (credential redaction, doubled gather call in routes.go:104-128, NewClientWithBackendCredentials factory, paramsJSON per-field size cap)
Optional: paginate the other gather functions

I can roll the nits into a follow-up if you want — they're all tiny and could land together.

Four small items from the review. 1. Credentials redaction (tencent/client.go) Added String() and MarshalJSON() on Credentials so %v / %+v / Println and json.Marshal all render SecretKey as **** instead of leaking the raw key. Direct field access (the SDK signature path) is unchanged. 2. Drop doubled gather call (api/routes.go) handleTencentResources called GetRelevantContext (full multi-section gather) and discarded the result, then called gatherTencentByType for the requested type — doubling SDK calls per request. Removed the GetRelevantContext call. 3. NewClientWithCredentials factory for parity (tencent/client.go) Added BackendTencentCredentials struct and NewClientWithCredentials constructor matching the shape AWS / GCP / Fly.io / etc. already use. Not wired into the backend credential flow yet — kept for consistency so the dispatch layer can treat Tencent the same as every other provider. 4. Per-field size cap on paramsJSON (tencent/raw.go) maxParamsJSONBytes (256 KiB total) + maxParamsFieldBytes (64 KiB per string field, walked recursively into nested maps and slices). The effective cap was the 1 MiB HTTP body limit, which is far larger than any legitimate Tencent action payload — 64 KiB still fits user-data scripts and policy documents while rejecting accidentally-pasted dumps from an LLM plan.

rephapeng · 2026-05-25T02:21:48Z

Nits tier landed in 7a71923. With this, every item from your review is addressed.

Credentials redaction (internal/tencent/client.go) — added String() + MarshalJSON() so %v/%+v/Println/json.Marshal all render SecretKey as ****. Direct field access (the SDK signature path) is unchanged so functionality is identical.
Doubled gather call (internal/api/routes.go) — handleTencentResources no longer calls GetRelevantContext before gatherTencentByType. Each request now hits the SDK once instead of twice. Verified end-to-end (lighthouse/cvm/cbs lists still return data).
NewClientWithCredentials factory (internal/tencent/client.go) — added BackendTencentCredentials struct + constructor matching the shape AWS / GCP / Fly.io already use, so the backend dispatch layer can treat Tencent uniformly once Tencent is wired into the backend credential flow.
paramsJSON size cap (internal/tencent/raw.go) — maxParamsJSONBytes = 256 KiB total, maxParamsFieldBytes = 64 KiB per string field (walked recursively into nested maps and slices). Effective ceiling was the 1 MiB HTTP body limit, which is well over any legitimate Tencent action payload.

Status of the full review:

Tier	Status
Critical (3)	Shipped in `1bbb914`
High (3)	Shipped in `2a18f7b`
Should-fix (3)	Shipped in `e5530b3`
Nits (4)	Shipped in `7a71923`
Rebase onto current master	Done — branch base is `d4d94f9`

The only thing left from your notes is paginating the remaining low-cardinality gather functions (CAM, NAT, VPN, CCN, DC, etc.) — those typically have <10 items per account so they don't hit the 100-row truncation in practice, but I'm happy to do them if you'd prefer everything paginated for uniformity.

Ready for another pass whenever you have a moment. Thanks again for the thorough first review.

Cron-facing alert that walks every PREPAID-capable resource (CVM, Lighthouse, CBS, MySQL, Postgres, Redis, MongoDB, CynosDB, CLB, AntiDDoS, and SSL with --include-ssl) across the requested regions and returns items at or below a renewal threshold. • CLI: clanker tencent expiry --regions=ap-x,ap-y --threshold=14 Exit 0 = nothing flagged, 1 = items in window, 2 = already expired. Drop-in for crontab + MAILTO, GitHub Actions, etc. • HTTP: GET /api/v1/tencent/expiry?regions=&threshold=&manual_only=&include_ssl= Returns the full report with counts breakdown (total / flagged / expired / auto_renew). Region query param is validated through the same regex SendRaw uses, so SSRF-shaped inputs are rejected. manual_only defaults to true so the cron only surfaces items that won't auto-renew (the actionable subset); auto-renewers are still counted in counts.auto_renew for visibility.

rafeegnash

Just walked the new revision end-to-end — really nice work, this looks great overall. Every item from the first review is properly addressed:

Rebase is clean (base is 2728110, no K8s deletions).
Constant-time token compare — verified.
Server refuses to start without a token (or explicit --insecure) — built the binary and confirmed empirically.
ReDoS defense on filter matches — pattern cap + goroutine-with-timeout on top of Go's RE2.
Region regex chokepoint with the typed *errInvalidRegion.
CORS default narrowed to http://localhost:4173.
Destructive allowlist (Describe|Get|List|Query|Lookup|Search|Check|Inquiry) — AddUser / CreateAccessKey / AttachUserPolicy correctly blocked without --destroyer.
HttpProfile.ReqTimeout = 30s + ctxDone between pages.
Pagination on CVM/VPC/SG up to 1000 with truncation warning.
Filter tests covering happy + sad paths + ReDoS defense.
Credential String() + MarshalJSON() redaction.
Doubled GetRelevantContext call removed.
NewClientWithBackendCredentials factory.
paramsJSON size caps.

Also reviewed the new /api/v1/tencent/expiry endpoint from 1a23646 — it validates regions / threshold / booleans cleanly. 👍

One blocker I caught while running the built binary: a single-request crash in writeTencentClientErr from the security commit (see inline comment). Genuinely a tiny typo-class bug — one line change to fix — but since the credential-missing error path is the most common non-region failure, any prod restart without env vars set hits it. Worth a quick httptest covering "no creds → 401" to lock it in.

A few smaller items I noticed while looking, none are blockers:

internal/api/routes_maker.go:147 — countDestructiveCommands still uses the old prefix denylist (Terminate|Delete|Destroy|Reset|Release|Discontinue) with the ResetInstancesPassword carve-out. Now that the executor is allowlist-based, the audit record undercounts: an AddUser blocked by --destroyer records as 0 destructive commands. Easy to fix by calling into isTencentDestructive from exec_tencent.go.
internal/api/middleware.go — failed bearer attempts return 401 but never reach logMiddleware (it's nested inside auth), so 401s are silent. A one-line log in the 401 branch would help for prod observability.
internal/api/server.go:28 comment still says "*" by default — should reflect the new http://localhost:4173 default.
internal/maker/exec_tencent.go:124-126 docstring still describes the old prefix denylist; should be updated to the allowlist.
routes_plan.go returns 200 OK + a warning field when the LLM emits unparseable JSON — would be cleaner as a 4xx so clients don't have to inspect the body to know it failed.

If you fix the recursion + roll a quick test for it, the rest of the small stuff can either come along in the same commit or land as a tiny follow-up — totally up to you. Thanks again for the careful work on this round, the diff reads really well now.

rafeegnash · 2026-05-25T11:18:26Z

+		writeError(w, http.StatusBadRequest, "invalid_region", err.Error())
+		return
+	}
+	writeTencentClientErr(w, err)


Tiny but critical bug from the security-tier cleanup — this branch recurses with the same args instead of returning the credential-failure response. Any non-*errInvalidRegion error (specifically the one tencent.NewClient returns when credentials are missing) hits this line and the function calls itself until the goroutine stack runs out.

Reproduced by building the PR + running clanker server --port 18080 --token x with no Tencent creds set, then curl -H 'Authorization: Bearer x' http://127.0.0.1:18080/api/v1/tencent/regions — the process dies with:

fatal error: stack overflow github.com/bgdnvk/clanker/internal/api.writeTencentClientErr(...) github.com/bgdnvk/clanker/internal/api.writeTencentClientErr(...) github.com/bgdnvk/clanker/internal/api.writeTencentClientErr(...) ...

Single authenticated request → server gone. Worth catching because credential rotation in prod would hit this without anyone touching the server.

One-line fix matching the intent documented in the PR description:

func writeTencentClientErr(w http.ResponseWriter, err error) { var ir *errInvalidRegion if errors.As(err, &ir) { writeError(w, http.StatusBadRequest, "invalid_region", err.Error()) return } writeError(w, http.StatusUnauthorized, "tencent_credentials", err.Error()) }

Would also be a good place to add a tiny httptest against any one Tencent handler with no creds set — that test would have caught this in CI.

Six items from rafeegnash's second-pass review. The first is a real production-killer; the rest are correctness + hygiene. 1. writeTencentClientErr no longer infinite-recurses (api/routes.go) The catch-all branch from the security commit called itself instead of writeError — stack overflow on the very first credential-missing request, taking the process with it. New regression test in api/routes_test.go (httptest + direct helper invocation, including the wrapped-error errors.As path) locks the fix in. 2. countDestructiveCommands uses the live classifier (api/routes_maker.go) The plan audit count was still computed from the old prefix denylist, so AddUser / CreateAccessKey / AttachUserPolicy gated by --destroyer recorded as 0 destructive commands. Exported IsTencentDestructive from the maker package; routes_maker now delegates to it so the audit count never drifts away from what the executor's safety gate enforces. 3. 401s no longer silent (api/middleware.go) authMiddleware wraps logMiddleware so a rejected request short-circuits before any access log fires. Added s.log401() that records method + path + remote_addr + reason on every failed bearer attempt, so prod credential rotations or auth attacks show up in stderr. 4. routes_plan returns 422 on unparseable LLM output (api/routes_plan.go) Was 200 + 'warning' field, which forced clients to inspect the body to know it failed. 422 Unprocessable Entity is the right semantic — we understood the request but the upstream result was unprocessable. The raw cleaned text is still in data.plan so the dashboard editor can offer hand-correction. 5. Comments brought in line with code (api/server.go, maker/exec_tencent.go) Config.CORSOrigin doc said '"*" by default'; now says http://localhost:4173. validateTencentCommand docstring described the old prefix denylist; now describes the allowlist.

rephapeng · 2026-05-25T13:48:59Z

Round-2 fixes landed in 6e2373b. Thanks for the careful walk-through and especially for catching the recursion — that one would've been bad on a credential rotation.

Recursion fix + regression test (internal/api/routes.go, internal/api/routes_test.go)
The catch-all branch now writes the 401 envelope instead of calling itself. New routes_test.go adds three table cases via httptest: (1) no credentials → 401 with tencent_credentials (the case that previously stack-overflowed), (2) ?region=evil%20region → 400 with invalid_region, (3) direct writeTencentClientErr calls including the wrapped-with-%w path through errors.As. go test ./internal/api/ passes.

countDestructiveCommands delegates to the live classifier (internal/api/routes_maker.go)
Exported IsTencentDestructive from maker/exec_tencent.go; routes_maker now calls it instead of carrying a duplicate denylist. The audit-record count tracks the executor's safety gate by construction — AddUser / CreateAccessKey / AttachUserPolicy are now counted as destructive in the history record same as they're blocked in execution.

401 logging (internal/api/middleware.go)
Added s.log401(r, reason) that writes [api] 401 METHOD URI from REMOTE — REASON on every rejected attempt. logMiddleware sits inside authMiddleware so 401s never reach the normal access log — verified with a wrong-token curl that the new log line shows up in stderr.

routes_plan returns 422 on unparseable LLM output (internal/api/routes_plan.go)
Now 422 Unprocessable Entity with an error.code = unparseable_plan envelope. The raw cleaned text and original parse error are still in data.plan so the dashboard editor can offer hand-correction; just the status code changed.

Stale comments updated

Config.CORSOrigin doc: "*" by default → defaults to http://localhost:4173.
validateTencentCommand docstring: prefix-denylist text → describes the read-only allowlist (Describe / Get / List / Query / Lookup / Search / Check / Inquiry).

Let me know if you'd like the destructive-classifier moved to a shared location (it currently lives in maker and gets imported by api) or anything else from this batch tweaked.

rafeegnash · 2026-05-25T15:34:45Z

Merged 🎉 Thank you for the careful work on this — really appreciate how thoroughly you walked through every item across the three review rounds. Recursion fix verified empirically against the built binary (same trigger that previously stack-overflowed now returns a clean 401 + JSON envelope, server stays up, 401 appears in the logs with method / URI / remote / reason). All five other items also addressed cleanly, and the new internal/api/routes_test.go will keep this from regressing in CI.

Tencent provider is now in master alongside the other clouds. Welcome aboard, and thanks again — this is great work to start from. 🙌

Upstream merged PR bgdnvk#165 (Tencent provider) and added work on top: k8s SRE playbooks (bgdnvk#174), SRE agent fix (bgdnvk#177), tree-wide gofmt -s (bgdnvk#176), README (bgdnvk#175), and three Tencent CLI features the fork lacked — `list --format json` (bgdnvk#179), `cost --format json` (bgdnvk#180), and security-scan CLI subcommands (bgdnvk#181). Conflict resolution: all 16 conflicts resolved to upstream's side. 14 were pure gofmt whitespace from bgdnvk#176 (identical code); billing.go and static_commands.go were upstream supersets adding the JSON/security CLI surface with no fork-unique code lost. Fixed a duplicate tencent import in cmd/ask.go left by the auto-merge. Verified in Docker (golang:1.25, -mod=mod): gofmt clean, go build ./..., go vet ./..., and go test ./... all pass.

rephapeng marked this pull request as ready for review May 18, 2026 14:23

rafeegnash reviewed May 24, 2026

View reviewed changes

rephapeng added 8 commits May 24, 2026 21:11

rephapeng force-pushed the pr/tencent-provider branch from 0e6182e to 1bbb914 Compare May 24, 2026 14:17

rafeegnash reviewed May 25, 2026

View reviewed changes

rafeegnash merged commit 2aa4d34 into bgdnvk:master May 25, 2026

Conversation

rephapeng commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Coverage

Security audits (10)

Maker integration

HTTP API

Credentials

Stats

Test plan

Uh oh!

bgdnvk commented May 23, 2026

Uh oh!

rafeegnash left a comment

Choose a reason for hiding this comment

Branch is out of date

Security notes on the HTTP API

High-severity items worth addressing

Should-fix before merge

Nits / parity notes

What we really like

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rephapeng commented May 24, 2026

Uh oh!

rephapeng commented May 24, 2026

Uh oh!

rephapeng commented May 24, 2026

Uh oh!

rephapeng commented May 25, 2026

Uh oh!

rephapeng commented May 25, 2026

Uh oh!

rafeegnash left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rephapeng commented May 25, 2026

Uh oh!

rafeegnash commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rephapeng commented May 18, 2026 •

edited

Loading