Skip to content
Draft
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 106 additions & 0 deletions ci/README-cleanup-pr-previews.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# PR Preview Cleanup Script

## Overview

This script (`cleanup-pr-previews`) helps maintain the `gh-pages` branch by cleaning up documentation preview folders for PRs that have been closed or merged.

## Problem

The current `doc_preview` action has some limitations that can result in stale preview folders:

1. Cleanup steps only run when the target branch is `main` (so PRs targeting feature branches don't get cleaned up)
2. Canceled/interrupted documentation jobs don't run cleanup steps
3. Various other edge cases where the cleanup logic isn't executed

This results in a mismatch between the number of `pr-XXXXX` folders in `docs/pr-preview/` and the actual number of open PRs.

## Solution

The `cleanup-pr-previews` script:

1. Fetches all `pr-XXXXX` folders from the `docs/pr-preview/` directory in the `gh-pages` branch
2. For each folder, extracts the PR number and checks its status via GitHub API
3. Identifies folders corresponding to closed/merged PRs or deleted PRs
4. Removes the stale folders and commits the changes back to `gh-pages`

## Usage

### Prerequisites

- `GH_TOKEN` environment variable with appropriate permissions
- GitHub CLI (`gh`) installed and authenticated
- `jq` installed for JSON parsing
- `git` available

### Basic Usage

```bash
# Preview what would be cleaned up (recommended first run)
./ci/cleanup-pr-previews NVIDIA/cuda-python true

# Actually perform the cleanup
./ci/cleanup-pr-previews NVIDIA/cuda-python false

# Use defaults (NVIDIA/cuda-python, actual cleanup)
./ci/cleanup-pr-previews
```

### Parameters

1. **repository** (optional): GitHub repository in `owner/repo` format. Default: `NVIDIA/cuda-python`
2. **dry-run** (optional): Set to `true` to preview changes without making them. Default: `false`

### Examples

```bash
# Preview cleanup for the main repository
./ci/cleanup-pr-previews NVIDIA/cuda-python true

# Clean up a different repository
./ci/cleanup-pr-previews myorg/my-repo false

# Show help
./ci/cleanup-pr-previews --help
```

## Sample Output

```
[INFO] Checking prerequisites...
[INFO] All prerequisites satisfied
[INFO] Fetching PR preview folders from gh-pages branch...
[INFO] Found 44 PR preview folders
[CHECK] Checking PR #415...
[REMOVE] PR #415 is closed
[CHECK] Checking PR #1021...
[KEEP] PR #1021 is still open
...

[SUMMARY]
Total PR preview folders: 44
Open PRs: 17
Folders to remove: 27

[FOLDERS TO REMOVE]
- pr-415 (PR #415)
- pr-435 (PR #435)
...

[CLEANUP] Proceeding to remove 27 folders...
[INFO] Cloning gh-pages branch to temporary directory...
[REMOVE] Removing docs/pr-preview/pr-415
...
[INFO] Committing changes...
[INFO] Pushing to gh-pages branch...
[SUCCESS] Cleanup completed! Removed 27 PR preview folders
```

## Security Considerations

- The script requires write access to the repository to modify the `gh-pages` branch
- Always run with `dry-run=true` first to verify the expected behavior
- The script clones the repository to a temporary directory which is automatically cleaned up

## Future Enhancements

This script could be integrated into a scheduled GitHub Actions workflow to run periodically (e.g., weekly) to automatically maintain the `gh-pages` branch.
206 changes: 206 additions & 0 deletions ci/cleanup-pr-previews
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
#!/usr/bin/env bash

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# SPDX-License-Identifier: Apache-2.0

# A utility script to clean up PR preview documentation folders for closed/merged PRs.
# This script checks all pr-XXXXX folders in the gh-pages branch docs/pr-preview/ directory,
# verifies if the corresponding PR XXXXX is still open, and removes preview folders
# for PRs that have been closed or merged.

set -euo pipefail

# Configuration
REPOSITORY="${1:-NVIDIA/cuda-python}"
DRY_RUN="${2:-false}"

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# Usage information
usage() {
cat << EOF
Usage: $0 [repository] [dry-run]
repository: GitHub repository (default: NVIDIA/cuda-python)
dry-run: Set to 'true' to preview what would be deleted without actually deleting (default: false)

Examples:
$0 # Clean up NVIDIA/cuda-python with actual deletions
$0 NVIDIA/cuda-python true # Preview what would be cleaned up
$0 myorg/my-repo false # Clean up a different repository

Requirements:
- GH_TOKEN environment variable must be set
- 'gh' (GitHub CLI) must be installed and authenticated
- 'jq' must be installed for JSON parsing
- 'git' must be available
EOF
exit 1
}

# Check for help flag
if [[ "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then
usage
fi

# Validate required tools and environment
echo -e "${YELLOW}[INFO]${NC} Checking prerequisites..."

if [[ -z "${GH_TOKEN:-}" ]]; then
echo -e "${RED}[ERROR]${NC} GH_TOKEN environment variable is required" >&2
exit 1
fi

if ! command -v jq >/dev/null 2>&1; then
echo -e "${RED}[ERROR]${NC} jq is required but not installed" >&2
exit 1
fi

if ! command -v gh >/dev/null 2>&1; then
echo -e "${RED}[ERROR]${NC} GitHub CLI (gh) is required but not installed" >&2
exit 1
fi

if ! command -v git >/dev/null 2>&1; then
echo -e "${RED}[ERROR]${NC} git is required but not installed" >&2
exit 1
fi

echo -e "${GREEN}[INFO]${NC} All prerequisites satisfied"

# Fetch PR preview folders from gh-pages branch
echo -e "${YELLOW}[INFO]${NC} Fetching PR preview folders from gh-pages branch..."

# Get the list of pr-XXXXX folders from gh-pages branch
PR_FOLDERS=$(gh api repos/"${REPOSITORY}"/contents/docs/pr-preview \
--header "Accept: application/vnd.github+json" \
--jq '.[] | select(.type == "dir" and (.name | test("^pr-[0-9]+$"))) | .name' \
--field ref=gh-pages 2>/dev/null || true)

if [[ -z "$PR_FOLDERS" ]]; then
echo -e "${YELLOW}[INFO]${NC} No PR preview folders found in gh-pages branch"
exit 0
fi

echo -e "${GREEN}[INFO]${NC} Found $(echo "$PR_FOLDERS" | wc -l) PR preview folders"

# Check each PR folder
FOLDERS_TO_REMOVE=()
TOTAL_FOLDERS=0
OPEN_PRS=0

while IFS= read -r folder; do
if [[ -z "$folder" ]]; then
continue
fi

TOTAL_FOLDERS=$((TOTAL_FOLDERS + 1))

# Extract PR number from folder name (pr-XXXXX -> XXXXX)
PR_NUMBER="${folder#pr-}"

echo -e "${YELLOW}[CHECK]${NC} Checking PR #${PR_NUMBER}..."

# Check PR status using GitHub API
PR_STATUS=$(gh api repos/"${REPOSITORY}"/pulls/"${PR_NUMBER}" \
--header "Accept: application/vnd.github+json" \
--jq '.state' 2>/dev/null || echo "not_found")

case "$PR_STATUS" in
"open")
echo -e "${GREEN}[KEEP]${NC} PR #${PR_NUMBER} is still open"
OPEN_PRS=$((OPEN_PRS + 1))
;;
"closed")
echo -e "${RED}[REMOVE]${NC} PR #${PR_NUMBER} is closed"
FOLDERS_TO_REMOVE+=("$folder")
;;
"not_found")
echo -e "${RED}[REMOVE]${NC} PR #${PR_NUMBER} not found (may have been deleted)"
FOLDERS_TO_REMOVE+=("$folder")
;;
*)
echo -e "${YELLOW}[UNKNOWN]${NC} PR #${PR_NUMBER} has unexpected status: ${PR_STATUS}"
;;
esac
done <<< "$PR_FOLDERS"

# Summary
echo ""
echo -e "${YELLOW}[SUMMARY]${NC}"
echo "Total PR preview folders: ${TOTAL_FOLDERS}"
echo "Open PRs: ${OPEN_PRS}"
echo "Folders to remove: ${#FOLDERS_TO_REMOVE[@]}"

if [[ ${#FOLDERS_TO_REMOVE[@]} -eq 0 ]]; then
echo -e "${GREEN}[INFO]${NC} No cleanup needed - all preview folders correspond to open PRs"
exit 0
fi

# List folders to be removed
echo ""
echo -e "${YELLOW}[FOLDERS TO REMOVE]${NC}"
for folder in "${FOLDERS_TO_REMOVE[@]}"; do
pr_num="${folder#pr-}"
echo " - $folder (PR #${pr_num})"
done

# Perform cleanup or show what would be done
echo ""
if [[ "$DRY_RUN" == "true" ]]; then
echo -e "${YELLOW}[DRY RUN]${NC} Would remove ${#FOLDERS_TO_REMOVE[@]} folders (use dry-run=false to actually remove)"
else
echo -e "${RED}[CLEANUP]${NC} Proceeding to remove ${#FOLDERS_TO_REMOVE[@]} folders..."

# Clone gh-pages branch to a temporary directory
TEMP_DIR=$(mktemp -d)
trap 'rm -rf "$TEMP_DIR"' EXIT

echo -e "${YELLOW}[INFO]${NC} Cloning gh-pages branch to temporary directory..."
git clone --depth 1 --branch gh-pages "https://github.com/${REPOSITORY}.git" "$TEMP_DIR" >/dev/null 2>&1

cd "$TEMP_DIR"

# Configure git for the cleanup commit
git config user.name "cuda-python-bot"
git config user.email "[email protected]"

# Remove each folder
REMOVED_COUNT=0
for folder in "${FOLDERS_TO_REMOVE[@]}"; do
pr_num="${folder#pr-}"
folder_path="docs/pr-preview/$folder"

if [[ -d "$folder_path" ]]; then
echo -e "${YELLOW}[REMOVE]${NC} Removing $folder_path"
rm -rf "$folder_path"
git add "$folder_path"
REMOVED_COUNT=$((REMOVED_COUNT + 1))
else
echo -e "${YELLOW}[SKIP]${NC} Folder $folder_path not found locally"
fi
done

if [[ $REMOVED_COUNT -gt 0 ]]; then
# Commit and push changes
commit_message="Clean up PR preview folders for ${REMOVED_COUNT} closed/merged PRs

Removed preview folders for the following PRs:
$(printf '%s\n' "${FOLDERS_TO_REMOVE[@]}" | sed 's/^pr-/- PR #/' | head -20)
$(if [[ ${#FOLDERS_TO_REMOVE[@]} -gt 20 ]]; then echo "... and $((${#FOLDERS_TO_REMOVE[@]} - 20)) more"; fi)"

echo -e "${YELLOW}[INFO]${NC} Committing changes..."
git commit -m "$commit_message"

echo -e "${YELLOW}[INFO]${NC} Pushing to gh-pages branch..."
git push origin gh-pages

echo -e "${GREEN}[SUCCESS]${NC} Cleanup completed! Removed ${REMOVED_COUNT} PR preview folders"
else
echo -e "${YELLOW}[INFO]${NC} No folders were actually removed (they may have been cleaned up already)"
fi
fi
56 changes: 56 additions & 0 deletions ci/test-cleanup-pr-previews
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/usr/bin/env bash

# Simple test script for cleanup-pr-previews
# This tests the basic argument parsing and help functionality

set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
CLEANUP_SCRIPT="${SCRIPT_DIR}/cleanup-pr-previews"

echo "Testing PR preview cleanup script..."

# Test 1: Help functionality
echo "Test 1: Help functionality"
help_output=$($CLEANUP_SCRIPT --help 2>&1 || true)
if echo "$help_output" | grep -q "Usage:"; then
echo "✓ Help functionality works"
else
echo "✗ Help functionality failed"
exit 1
fi

# Test 2: Script is executable
echo "Test 2: Script executability"
if [[ -x "$CLEANUP_SCRIPT" ]]; then
echo "✓ Script is executable"
else
echo "✗ Script is not executable"
exit 1
fi

# Test 3: Script handles missing GH_TOKEN
echo "Test 3: Missing GH_TOKEN handling"
error_output=$($CLEANUP_SCRIPT 2>&1 || true)
if echo "$error_output" | grep -q "GH_TOKEN environment variable is required"; then
echo "✓ Missing GH_TOKEN handled correctly"
else
echo "✗ Missing GH_TOKEN not handled correctly"
exit 1
fi

# Test 4: Check shebang and basic shell syntax
echo "Test 4: Shell syntax validation"
if bash -n "$CLEANUP_SCRIPT"; then
echo "✓ Shell syntax is valid"
else
echo "✗ Shell syntax errors detected"
exit 1
fi

echo ""
echo "All tests passed! ✓"
echo ""
echo "To actually test the script functionality, run:"
echo " export GH_TOKEN=your_token"
echo " $CLEANUP_SCRIPT NVIDIA/cuda-python true"
Loading