Skip to content

cubelet: add diagnostic context to newExt4RawByReflinkCopy errors#237

Open
WaylandYang wants to merge 1 commit into
TencentCloud:masterfrom
WaylandYang:feat/reflink-copy-error-diagnostics
Open

cubelet: add diagnostic context to newExt4RawByReflinkCopy errors#237
WaylandYang wants to merge 1 commit into
TencentCloud:masterfrom
WaylandYang:feat/reflink-copy-error-diagnostics

Conversation

@WaylandYang
Copy link
Copy Markdown

Implements proposal (2) from #235.

What it does

When newExt4RawByReflinkCopy fails, the returned error now appends a single-line diagnostic suffix:

newExt4RawByReflinkCopy failed:<stderr> [step=3/4 cmd="e2fsck -fy /a/target.raw" elapsed=812ms target=size=3221225472 base=size=536870912 free=5368709120B]

Fields: command index (1-of-N), the failing argv, elapsed time, target/base file stats (size or "missing"), and free bytes on the target's volume.

Format choices

  • Suffix on the existing message, not a replacement — any string-matching callers (none found in this repo, but defensive) still see the same "newExt4RawByReflinkCopy failed:<stderr>" prefix.
  • Single line so audit / log pipelines that split on \n don't separate the diagnostic from the underlying error.
  • Best-effort statsstat / statfs errors are reported inline ("stat err=<msg>") instead of failing the diagnostic itself; the caller already has a real error to return.

Out of scope (deliberately)

  • newExt4BaseRaw and newExt4RawByCopy have the same opaque-error shape and would benefit from the same describeStorageFailure helper, but the issue named newExt4RawByReflinkCopy specifically. Happy to extend to the other two in a follow-up if you'd like.
  • The helper is diagnostic-only; no retry / structured error refactor.

Tested

  • go vet ./storage/... — clean
  • go build ./storage/... — clean
  • gofmt -l — clean

Related

@WaylandYang WaylandYang requested a review from fslongjin as a code owner May 14, 2026 03:53
@fslongjin
Copy link
Copy Markdown
Member

/cubebot review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 14, 2026

Claude finished @fslongjin's task in 3m 23s —— View job


Code Review

  • Read CLAUDE.md for repo guidelines
  • Read the changed file and diff
  • Analyze code and provide review feedback

Overall: This is a clean, well-structured change that adds valuable diagnostic context to an otherwise opaque error path. The design choices (suffix approach, single-line format, best-effort stats) are all sound. Below are my findings.

Positive

  • The describeStorageFailure / describeFile / describeFreeBytes helpers are well-factored with clear doc comments and a defined format contract.
  • Best-effort stat handling (missing, stat err=..., statfs err=...) is the right call — failing the diagnostic would mask the real error.
  • Preserving the "newExt4RawByReflinkCopy failed:<stderr>" prefix is good for backward compatibility.
  • The started timer before the loop captures total elapsed time across all steps, which is more useful than per-command timing for diagnosing slow failures.
  • DCO sign-off is present.

Issues

1. No unit tests for the new helpers (should fix)

The three new unexported helpers (describeStorageFailure, describeFile, describeFreeBytes) are pure functions that are straightforward to unit-test without needing a reflink-capable filesystem. Per CONTRIBUTING.md: "Add tests — add focused test coverage when behavior changes." The existing shell_test.go only tests the happy path via integration-style tests; the diagnostic formatting logic has zero coverage.

Fix this →

2. Deferred error path missing diagnostics (minor, consider)

At shell.go:103, the defer block that handles FileExistAndValid failure still uses the old format:

err = fmt.Errorf("newExt4RawByReflinkCopy failed:%s", err)

If all commands succeed but the post-validation check fails, the error is still opaque. This is a different failure mode (not a command failure), so it may warrant different diagnostics, but as-is it's inconsistent with the new command-failure path. Consider appending at least target file stats here too, or adding a comment explaining why it's omitted.

3. stderr may contain newlines (observation, not a regression)

The PR description notes the format is "single line" so log pipelines don't split it. However, stderr from commands like e2fsck can contain newlines, so the error message was already potentially multi-line before this change. The diagnostic suffix will appear after the last newline of stderr. This is a pre-existing issue, not introduced by this PR, but worth noting since the stated goal is single-line output. A future enhancement could strings.ReplaceAll(stderr, "\n", " ") or truncate stderr.

4. path.Dir edge case (very minor)

In describeFreeBytes(path.Dir(target)) at shell.go:149 — if target is a bare filename without a directory component, path.Dir returns ".", which would statfs the process CWD rather than the target's volume. In practice, callers always pass absolute paths, so this is unlikely to trigger, but filepath.Dir would be more correct for OS paths.

Summary

The change is well-designed and ready to ship with one recommended addition: unit tests for the new helpers (issue #1). The other items are minor observations for potential follow-up.

Implements proposal (2) from TencentCloud#235.

On failure the error now appends step index, the failing argv,
elapsed time, target/base file stats, and free space:

    newExt4RawByReflinkCopy failed:<stderr> [step=3/4
    cmd="e2fsck -fy ..." elapsed=812ms target=size=...
    base=size=... free=...B]

Diagnostic is a suffix on the existing prefix so any string-matching
caller is unaffected. Stat/statfs errors are reported inline rather
than failing the diagnostic.

Adds unit tests for describeStorageFailure / describeFile /
describeFreeBytes covering the missing-file, empty-path, valid-stat,
statfs-error, and full-format-output cases.

Signed-off-by: Wayland Yang <wayland0916@gmail.com>
@WaylandYang WaylandYang force-pushed the feat/reflink-copy-error-diagnostics branch from f5c4c69 to 0fd59e9 Compare May 14, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants