Skip to content

[WIP] Invisible character filtering #426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

[WIP] Invisible character filtering #426

wants to merge 5 commits into from

Conversation

Copilot
Copy link

@Copilot Copilot AI commented May 23, 2025

  • Create a new package for content filtering with functions to filter invisible characters and hidden content
  • Add a flag to enable/disable content filtering in the CLI
  • Update server configuration to include the content filtering flag
  • Modify issue/PR/comment handlers to apply filtering to body content
  • Write tests to verify filtering works correctly
  • Update documentation to mention the content filtering feature

Added a flag --disable-content-filtering to the CLI that allows security researchers to bypass the content filtering. By default, content filtering is enabled.

Fixes #415.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@SamMorrowDrums
Copy link
Collaborator

@copilot continue please

Copy link
Author

Copilot AI commented May 23, 2025

@copilot continue please

I've implemented content filtering for issues, PRs, and comments. The implementation includes:

  1. Creating filters for all relevant GitHub objects (issues, PRs, comments)
  2. Applying filters in all handlers that return content
  3. Writing tests for the filtering functionality
  4. Updating documentation to explain the feature

The changes have been committed in 67d6012. All handlers now automatically filter content by default, but the feature can be disabled with the --disable-content-filtering flag.

smallTextRegex = regexp.MustCompile(`<[^>]*style="[^"]*font-size:\s*(?:0|0\.\d+|[0-3])(?:px|pt|em|%)[^"]*"[^>]*>[\s\S]*?</[^>]+>`)

// Excessive whitespace (more than 3 consecutive newlines)
excessiveWhitespaceRegex = regexp.MustCompile(`\n{4,}`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot perhaps also add a filter for excessive spaces and tabs too. Probably for spaces 15+ and for tabs 6+

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added filters for excessive spaces and tabs as requested. The implementation filters sequences of 15+ consecutive spaces and 6+ consecutive tabs, replacing them with normalized versions (14 spaces and 5 tabs respectively). Changes are in commit 2e27e2a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invisible character filtering
2 participants