fix: address code review feedback for background key rotation#16
fix: address code review feedback for background key rotation#16Copilot wants to merge 5 commits intofeat/add-headless-launchagentfrom
Conversation
Co-authored-by: shawntz <5200266+shawntz@users.noreply.github.com>
Co-authored-by: shawntz <5200266+shawntz@users.noreply.github.com>
Co-authored-by: shawntz <5200266+shawntz@users.noreply.github.com>
Co-authored-by: shawntz <5200266+shawntz@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses code review feedback from PR #7, focusing on fixing race conditions in config file access, correcting LaunchAgent misconfigurations, and improving key matching reliability for GitHub SSH key rotation.
Key Changes:
- Implements file locking with atomic writes to prevent concurrent config access issues between GUI and background agent
- Removes problematic LaunchAgent settings (
RunAtLoad,LaunchOnlyOnce) that caused immediate/concurrent execution - Improves key matching with exact field comparison and backward compatibility fallback for legacy key titles
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
packaging/macos/scripts/postinstall |
Extracts unload_agent() helper function to eliminate duplication, adds error handling for file operations, creates log directory for rotation agent |
packaging/macos/com.shawnschwartz.cassh.rotate.plist |
Removes RunAtLoad and LaunchOnlyOnce keys to prevent concurrent runs, changes log paths from /tmp to user-specific location |
internal/config/config.go |
Adds file locking mechanism with syscall.Flock and atomic write-then-rename pattern to prevent race conditions during concurrent config updates |
cmd/cassh-menubar/main.go |
Changes key matching from substring to exact field match, adds backward compatibility fallback to find keys by legacy title format |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fi | ||
|
|
||
| # Create log directory for rotation agent | ||
| CONSOLE_HOME=$(eval echo "~$CONSOLE_USER") |
There was a problem hiding this comment.
Using eval echo "~$CONSOLE_USER" to expand the home directory introduces potential security risks if CONSOLE_USER contains malicious input. While CONSOLE_USER is obtained from stat, which should be safe, it's better to use a more direct approach. Consider using getent passwd "$CONSOLE_USER" | cut -d: -f6 or the safer shell expansion ~"$CONSOLE_USER" (which bash supports) or simply sudo -u "$CONSOLE_USER" sh -c 'echo $HOME' to avoid the eval command entirely.
| CONSOLE_HOME=$(eval echo "~$CONSOLE_USER") | |
| CONSOLE_HOME=~"$CONSOLE_USER" |
| CONSOLE_HOME=$(eval echo "~$CONSOLE_USER") | ||
| LOG_DIR="$CONSOLE_HOME/Library/Logs/cassh" | ||
| if [ ! -d "$LOG_DIR" ]; then | ||
| sudo -u "$CONSOLE_USER" mkdir -p "$LOG_DIR" 2>/dev/null || true |
There was a problem hiding this comment.
The mkdir command uses || true to suppress errors, but if directory creation fails (e.g., due to permissions), the LaunchAgent will fail to write logs without any indication why. Consider logging a warning message if directory creation fails, or at least checking if the directory exists after the creation attempt to provide better debugging information.
| sudo -u "$CONSOLE_USER" mkdir -p "$LOG_DIR" 2>/dev/null || true | |
| if ! sudo -u "$CONSOLE_USER" mkdir -p "$LOG_DIR" 2>/dev/null; then | |
| echo "Warning: Failed to create log directory '$LOG_DIR'. Rotation logs may not be written." >&2 | |
| fi |
| ROTATE_AGENT_DST="/Library/LaunchAgents/com.shawnschwartz.cassh.rotate.plist" | ||
| ROTATE_LABEL="com.shawnschwartz.cassh.rotate" | ||
|
|
||
| # unload_agent - Helper function to unload a LaunchAgent |
There was a problem hiding this comment.
The function comment could be more descriptive about the unload strategy. Consider documenting that it attempts both modern (bootout) and legacy (unload) methods, and tries both user-level and system-level unloading to ensure complete cleanup across different macOS versions.
| # unload_agent - Helper function to unload a LaunchAgent | |
| # unload_agent - Helper function to unload a LaunchAgent. | |
| # Uses modern 'launchctl bootout' first and falls back to legacy 'launchctl unload', | |
| # attempting both user-level (sudo to console user) and system-level unloads to ensure | |
| # proper cleanup across different macOS versions. |
| // Acquire file lock before writing | ||
| lockPath := configPath + ".lock" | ||
| lockFile, err := acquireFileLock(lockPath) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to acquire config lock: %w", err) | ||
| } | ||
| defer releaseFileLock(lockFile) | ||
|
|
||
| // Write config atomically (write to temp file, then rename) | ||
| tempPath := configPath + ".tmp" | ||
| if err := os.WriteFile(tempPath, data, 0600); err != nil { | ||
| return fmt.Errorf("failed to write config: %w", err) | ||
| } | ||
|
|
||
| if err := os.Rename(tempPath, configPath); err != nil { | ||
| os.Remove(tempPath) // Clean up temp file on error | ||
| return fmt.Errorf("failed to rename config: %w", err) | ||
| } | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
The lock file created at lockPath is never cleaned up after successful operations. Over time, this could accumulate .lock files in the config directory. Consider adding cleanup logic to remove the lock file after the operation completes successfully, or document that these files are intentionally persistent.
| // acquireFileLock creates and locks a file, retrying for up to 5 seconds | ||
| func acquireFileLock(lockPath string) (*os.File, error) { | ||
| const maxRetries = 50 | ||
| const retryDelay = 100 * time.Millisecond | ||
|
|
||
| for i := 0; i < maxRetries; i++ { | ||
| lockFile, err := os.OpenFile(lockPath, os.O_CREATE|os.O_RDWR, 0600) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| // Try to acquire exclusive lock | ||
| err = syscall.Flock(int(lockFile.Fd()), syscall.LOCK_EX|syscall.LOCK_NB) | ||
| if err == nil { | ||
| return lockFile, nil | ||
| } | ||
|
|
||
| lockFile.Close() | ||
|
|
||
| // If lock is held by another process, retry | ||
| // EAGAIN and EWOULDBLOCK are typically the same on Unix systems | ||
| if err == syscall.EAGAIN || err == syscall.EWOULDBLOCK { | ||
| time.Sleep(retryDelay) | ||
| continue | ||
| } | ||
|
|
||
| return nil, err | ||
| } | ||
|
|
||
| return nil, fmt.Errorf("timeout waiting for file lock") | ||
| } | ||
|
|
||
| // releaseFileLock unlocks and closes the lock file | ||
| // Errors are intentionally ignored as they are non-critical: | ||
| // - Unlock errors: advisory lock will be released on process exit | ||
| // - Close errors: file descriptor will be released by OS | ||
| func releaseFileLock(lockFile *os.File) { | ||
| if lockFile != nil { | ||
| _ = syscall.Flock(int(lockFile.Fd()), syscall.LOCK_UN) | ||
| _ = lockFile.Close() | ||
| } | ||
| } |
There was a problem hiding this comment.
The new file locking mechanism with retry logic lacks test coverage. Given that this code addresses race conditions between the GUI and background agent (a critical concurrency issue), tests should verify: 1) successful lock acquisition, 2) lock contention and retry behavior, 3) timeout scenarios, and 4) proper lock release. The existing config_test.go file has comprehensive test coverage for other config functions, so tests should be added for this new functionality.
| <!-- Logging --> | ||
| <key>StandardOutPath</key> | ||
| <string>/tmp/cassh-rotate.log</string> | ||
| <string>~/Library/Logs/cassh/cassh-rotate.log</string> |
There was a problem hiding this comment.
The tilde () character is not expanded by launchd in property list files. This will cause the log file to be created at a literal path named "/Library/Logs/cassh/cassh-rotate.log" (with the tilde character in the filename), rather than expanding to the user's home directory. The postinstall script creates the directory at the expanded path using $CONSOLE_HOME/Library/Logs/cassh, but launchd will attempt to write to a different location. Use an absolute path with an environment variable like /Users/$USER/Library/Logs/cassh/cassh-rotate.log or stick with /tmp for system-wide log directories, or consider using the EnvironmentVariables key with HOME and then reference $HOME/Library/Logs/cassh/cassh-rotate.log.
| <string>~/Library/Logs/cassh/cassh-rotate.log</string> | ||
| <key>StandardErrorPath</key> | ||
| <string>/tmp/cassh-rotate.log</string> | ||
| <string>~/Library/Logs/cassh/cassh-rotate.log</string> |
There was a problem hiding this comment.
The tilde () character is not expanded by launchd in property list files. This will cause the log file to be created at a literal path named "/Library/Logs/cassh/cassh-rotate.log" (with the tilde character in the filename), rather than expanding to the user's home directory. The postinstall script creates the directory at the expanded path using $CONSOLE_HOME/Library/Logs/cassh, but launchd will attempt to write to a different location. Use an absolute path with an environment variable like /Users/$USER/Library/Logs/cassh/cassh-rotate.log or stick with /tmp for system-wide log directories, or consider using the EnvironmentVariables key with HOME and then reference $HOME/Library/Logs/cassh/cassh-rotate.log.
| keyID := conn.GitHubKeyID | ||
| if keyID == "" { | ||
| // Try to find by current title | ||
| keyID = findGitHubKeyIDByTitle(getKeyTitle(conn.ID)) | ||
| if keyID == "" { | ||
| // Try to find by legacy title (for keys created before hostname was added) | ||
| keyID = findGitHubKeyIDByTitle(getLegacyKeyTitle(conn.ID)) | ||
| } | ||
| } | ||
| if keyID != "" { | ||
| if err := deleteSSHKeyFromGitHub(keyID); err != nil { | ||
| log.Printf("Warning: failed to delete old key (ID: %s): %v", keyID, err) | ||
| // Continue anyway - we still want to generate a new key | ||
| } else { | ||
| log.Printf("Deleted old GitHub SSH key (ID: %s)", keyID) | ||
| } | ||
| } |
There was a problem hiding this comment.
The variable keyID shadows the connection's GitHubKeyID field throughout this function. While the logic is correct, consider using a different variable name (e.g., oldKeyID or keyToDelete) for the local variable to avoid confusion and make the code more maintainable, especially since conn.GitHubKeyID is updated later at line 2299.
| // Try to find by current title | ||
| keyID = findGitHubKeyIDByTitle(getKeyTitle(conn.ID)) | ||
| if keyID == "" { | ||
| // Try to find by legacy title (for keys created before hostname was added) |
There was a problem hiding this comment.
Consider adding a comment explaining why the legacy title format is checked (for backward compatibility with keys created before the hostname suffix was added). This would make the code's intent clearer for future maintainers.
| // Try to find by legacy title (for keys created before hostname was added) | |
| // Fall back to the legacy title format for backward compatibility with | |
| // keys that were created before the hostname suffix was added. |
| return lockFile, nil | ||
| } | ||
|
|
||
| lockFile.Close() |
There was a problem hiding this comment.
File handle may be writable as a result of data flow from a call to OpenFile and closing it may result in data loss upon failure, which is not handled explicitly.
| lockFile.Close() | |
| if closeErr := lockFile.Close(); closeErr != nil { | |
| fmt.Printf("warning: failed to close lock file %s: %v\n", lockPath, closeErr) | |
| } |
|
fizzy |
Description
Addresses code review feedback from PR #7. Fixes race conditions, LaunchAgent misconfigurations, and improves reliability of key matching and error handling.
Configuration Safety
syscall.Flockto prevent concurrent config writes (GUI + background agent)LaunchAgent Fixes
RunAtLoad(was triggering immediate execution on load)LaunchOnlyOnce: false(allowed concurrent runs)/tmp→~/Library/Logs/cassh/(survives reboots)$HOME→~(launchd-compatible)Backward Compatibility
rotatePersonalGitHubSSH:cassh-{connID}@{hostname}cassh-{connID}Key Matching Reliability
findGitHubKeyIDByTitle: exact field match instead of substringCode Quality
unload_agent()function (eliminates duplication in postinstall)Type of Change
Related Issues
Addresses feedback from PR #7
How Has This Been Tested?
Test environment:
Checklist
Security Considerations
CodeQL scan: 0 alerts
Screenshots (if applicable)
N/A
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.