Skip to content
55 changes: 47 additions & 8 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Brokli (a play on "broken links") is a CLI tool for checking broken links on web
- `pkg/fetcher/`: HTTP operations (`GetHTML`)
- `pkg/parser/`: HTML/XML parsing (`html.go`, `sitemap.go`)
- `pkg/resolver/`: URL resolution logic (`ResolveAbsoluteUrl`, `IsSpecialLink`)
- `pkg/output/`: Output formatting with interface-based design (`Formatter` interface, `DefaultFormatter`, `VerboseFormatter`, `GitHubFormatter`)

**Target Use Case**: Local development workflow - developers run `brokli check url https://localhost:3000` or `brokli check sitemap https://localhost:3000/sitemap.xml` to validate links before deployment.

Expand All @@ -18,7 +19,8 @@ Brokli (a play on "broken links") is a CLI tool for checking broken links on web
- ✅ **Concurrent HTTP Checking** - Worker pool with configurable workers (default: 10)
- ✅ **Progress Indication** - Real-time counter with thread-safe serial callback
- ✅ **Colored Terminal Output** - Status code coloring (green/red/cyan/yellow)
- ✅ **Verbose Mode** - `--verbose/-v` flag to show all links vs broken only
- ✅ **Multiple Output Formats** - `--output-format` flag with `default`, `verbose`, and `github` options
- ✅ **GitHub Actions Integration** - Native support with workflow annotations and step outputs
- ✅ **Smart Filtering** - Display broken links (4xx/5xx) by default
- ✅ **URL & Sitemap Support** - Check single pages or entire sitemaps
- ✅ **Comprehensive Testing** - 94%+ test coverage with race detection
Expand All @@ -33,7 +35,8 @@ Brokli (a play on "broken links") is a CLI tool for checking broken links on web
- `pkg/fetcher`: Only HTTP fetching
- `pkg/parser`: Only parsing (HTML/XML → data structures)
- `pkg/resolver`: Only URL resolution
- Import packages with descriptive names: `fetcher.GetHTML()`, `parser.ParseHTML()`, `resolver.ResolveAbsoluteUrl()`
- `pkg/output`: Only output formatting (interface-based design)
- Import packages with descriptive names: `fetcher.GetHTML()`, `parser.ParseHTML()`, `resolver.ResolveAbsoluteUrl()`, `output.NewFormatter()`

### Error Handling
- Functions return errors wrapped with context: `fmt.Errorf("failed to parse URL '%s': %w", urlStr, err)`
Expand All @@ -60,19 +63,39 @@ Brokli (a play on "broken links") is a CLI tool for checking broken links on web
- Checker package operates on `link.AnchorTag` and `link.SitemapUrl` to set Status fields

### Output & Display
- **Output Formatting** uses interface-based design in `pkg/output/`:
- `Formatter` interface defines `FormatPageResults()` and `FormatSitemapResults()`
- Three implementations: `DefaultFormatter`, `VerboseFormatter`, `GitHubFormatter`
- Factory function: `output.NewFormatter(format)` creates appropriate formatter
- String constants for format names: `output.FormatNameDefault`, `output.FormatNameVerbose`, `output.FormatNameGitHub`
- **CLI Flags**:
- `--output-format` (or `-o`): Choose output format: `default`, `verbose`, or `github`
- `-v/--verbose`: Deprecated but maintained for backward compatibility, sets format to verbose
- **Default Format** (`output.FormatDefault`):
- Shows only broken links (4xx/5xx status codes)
- Color-coded status codes
- Summary with count of broken links
- **Verbose Format** (`output.FormatVerbose`):
- Shows all links with their status codes
- Includes working links (2xx) and redirects (3xx)
- Full summary statistics
- **GitHub Actions Format** (`output.FormatGitHub`):
- Emits workflow annotations: `::error title="Broken Link (Status 404)"::Link 'text' to url returned status 404`
- Writes to `GITHUB_OUTPUT` environment file (if set) with secure 0600 permissions
- Step outputs: `broken_links_count`, `total_links_count`, `has_broken_links`
- Simplified text summary for stdout
- Color-coded status using `github.com/fatih/color`:
- Green: 2xx success codes
- Cyan: 3xx redirect codes
- Red: 4xx client errors
- Bold Red: 5xx server errors
- Yellow: -1 unchecked/error state
- Smart filtering: By default shows only broken links (4xx/5xx)
- Verbose mode (`--verbose/-v`): Shows all links with status codes
- Helper functions in `cmd/check.go`:
- Helper functions in `pkg/output/utils.go`:
- `getStatusIcon()`: Returns emoji/symbol for status
- `getColoredStatus()`: Returns colored status code string
- `isBrokenLink()`: Determines if link should be displayed by default
- Per-command flag retrieval (no global variables)
- `countBrokenLinks()`: Counts broken links in slice
- `countBrokenSitemapUrls()`: Counts broken sitemap URLs

### Testing Patterns
- Test files mirror source files: `fetcher_test.go`, `resolver_test.go`, `html_test.go`, `sitemap_test.go`
Expand Down Expand Up @@ -155,6 +178,15 @@ Use VS Code launch configurations (`.vscode/launch.json`):
2. Add tests in `resolver_test.go` with expected empty URL behavior
3. Update integration tests in parser package if needed

**Adding a new output format:**
1. Create new formatter struct in `pkg/output/` (e.g., `JSONFormatter`)
2. Implement the `Formatter` interface with `FormatPageResults()` and `FormatSitemapResults()` methods
3. Add new format constant to `pkg/output/output.go` (e.g., `FormatJSON Format = "json"`, `FormatNameJSON = "json"`)
4. Update `NewFormatter()` factory function to handle new format
5. Add comprehensive tests in `output_test.go` for the new formatter
6. Update CLI flag help text in `cmd/check.go` to include new format option
7. Document the new format in README.md

**Modifying data structures:**
- Data types are in `pkg/link/link.go` - keep them pure (no business logic)
- Add validation in constructor functions in `pkg/parser/` (e.g., `newAnchorTag`, `newSitemapUrl`)
Expand All @@ -163,6 +195,13 @@ Use VS Code launch configurations (`.vscode/launch.json`):
**Adding colored output to new commands:**
- Import `github.com/fatih/color` for terminal coloring
- Define color schemes: `color.New(color.FgGreen)`, `color.New(color.FgRed, color.Bold)`
- Use helper functions: `getStatusIcon()`, `getColoredStatus()` for consistency
- Use helper functions from `pkg/output/utils.go`: `getStatusIcon()`, `getColoredStatus()` for consistency
- Show summary statistics: total links, broken links, redirects
- Follow existing pattern in `cmd/check.go` for consistent UX
- Follow existing formatter pattern in `pkg/output/` for consistent UX

**Working with output formatters:**
- All formatters write to `io.Writer` (typically `os.Stdout`)
- Use `output.NewFormatter(format)` factory to get appropriate formatter instance
- Format selection logic in `cmd/check.go` uses string constants from `output` package
- Backward compatibility: `-v` flag still sets format to verbose (deprecated pattern)
- GitHub formatter handles `GITHUB_OUTPUT` environment variable automatically
90 changes: 89 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Brokli (a play on "broken links") is a CLI tool that helps developers validate a
- 📊 **Progress Indication** - Real-time progress counter shows checking status
- 🎯 **Smart Filtering** - Shows only broken links by default to reduce noise
- 🔍 **Verbose Mode** - Optional flag to display all links with their status codes
- 🔄 **GitHub Actions Integration** - Native support with workflow annotations and step outputs
- ⚙️ **Configurable** - Adjust workers, timeouts, redirects, and user agent
- 🌐 **Sitemap Support** - Check entire sitemaps with metadata display
- 🧪 **Well Tested** - 94%+ test coverage with comprehensive test suite
Expand Down Expand Up @@ -80,8 +81,10 @@ Checking links... 10/10
Show all links with their status codes:

```bash
brokli check url https://example.com --verbose
brokli check url https://example.com --output-format=verbose
# or use the short flag
brokli check url https://example.com -o verbose
# backward compatible with the old -v flag
brokli check url https://example.com -v
```

Expand All @@ -100,6 +103,27 @@ All Links:
Summary: 1 broken link found out of 10 total
```

### Output Formats

Brokli supports multiple output formats via the `--output-format` (or `-o`) flag:

- **`default`** - Shows only broken links with color-coded status (default behavior)
- **`verbose`** - Shows all links with their status codes
- **`github`** - GitHub Actions compatible format with workflow annotations and step outputs

**Examples:**

```bash
# Default format (broken links only)
brokli check url https://example.com

# Verbose format (all links)
brokli check url https://example.com -o verbose

# GitHub Actions format
brokli check url https://example.com -o github
```

### Check a Sitemap

Check all URLs in a sitemap:
Expand All @@ -121,6 +145,59 @@ Broken URLs:
Summary: 2 broken URLs found out of 70 total
```

### GitHub Actions Integration

Use Brokli in GitHub Actions CI/CD workflows with the `--output-format=github` flag:

```bash
brokli check url https://example.com --output-format=github
brokli check sitemap https://example.com/sitemap.xml --output-format=github
# Or use the short form
brokli check url https://example.com -o github
```

This formats output specifically for GitHub Actions:
- **Workflow Annotations**: Broken links appear as errors in the GitHub Actions UI
- **Step Outputs**: Summary statistics are written to `GITHUB_OUTPUT` for use in downstream steps
- **Exit Code**: Standard exit codes for CI integration

**Example GitHub Actions Workflow:**

```yaml
name: Check Links
on: [push, pull_request]

jobs:
check-links:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install Brokli
run: |
curl -L https://github.com/EndlessTrax/brokli/releases/latest/download/brokli_linux_amd64.tar.gz | tar xz
sudo mv brokli /usr/local/bin/

- name: Check site links
id: check
run: brokli check url https://yoursite.com --output-format=github
continue-on-error: true

- name: Check results
run: |
echo "Total links: ${{ steps.check.outputs.total_links_count }}"
echo "Broken links: ${{ steps.check.outputs.broken_links_count }}"
if [ "${{ steps.check.outputs.has_broken_links }}" == "true" ]; then
echo "⚠️ Broken links found!"
exit 1
fi
```

**Step Outputs Available:**
- `broken_links_count` - Number of broken links found
- `total_links_count` - Total number of links checked
- `has_broken_links` - Boolean (`true`/`false`) indicating if any broken links were found

### Status Code Colors

- 🟢 **Green** `[200]` - Success (2xx status codes)
Expand Down Expand Up @@ -149,6 +226,17 @@ Quick validation before deploying to production:
brokli check sitemap https://staging.example.com/sitemap.xml -v
```

### CI/CD Integration

Automate link checking in your GitHub Actions workflow:

```bash
# In GitHub Actions workflow
brokli check url https://example.com --output-format=github
```

See the [GitHub Actions Integration](#github-actions-integration) section for complete workflow examples.

## Roadmap

### Planned Features
Expand Down
Loading