Skip to content

Chrome AI RTD Provider: fix QuotaExceededError with large page content#14295

Open
drpaulfarrow wants to merge 3 commits intoprebid:masterfrom
drpaulfarrow:pr/chrome-ai-rtd-fix
Open

Chrome AI RTD Provider: fix QuotaExceededError with large page content#14295
drpaulfarrow wants to merge 3 commits intoprebid:masterfrom
drpaulfarrow:pr/chrome-ai-rtd-fix

Conversation

@drpaulfarrow
Copy link

Type of change
[x] Bugfix

Description of change
Added MAX_TEXT_LENGTH constant (1000 chars) and text truncation logic in getPageText() to prevent QuotaExceededError when Chrome AI APIs process pages with large amounts of text content.

Root cause: Chrome's built-in AI APIs (LanguageDetector, Summarizer) have internal quotas on input size. When document.body.textContent exceeds these limits, the APIs throw QuotaExceededError exceptions, causing the RTD submodule to fail silently.

Changes:
Added MAX_TEXT_LENGTH: 1000 to CONSTANTS object
Updated getPageText() to truncate text exceeding the limit before passing to Chrome AI APIs

Testing:
gulp lint passes
gulp test --file test/spec/modules/chromeAiRtdProvider_spec.js passes
Manually verified on pages with 50k+ character content

Other information
This is a minimal, focused fix - only 6 lines added. The 1000 character limit provides sufficient text for accurate language detection and keyword summarization while staying well within Chrome AI API quotas.

Added MAX_TEXT_LENGTH constant (1000 chars) and text truncation logic
in getPageText() to prevent QuotaExceededError when Chrome AI APIs
process pages with large amounts of text content.

Without this fix, pages with extensive text content can cause the
Chrome AI APIs (LanguageDetector, Summarizer) to throw
QuotaExceededError exceptions.
Copilot AI review requested due to automatic review settings December 23, 2025 10:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a QuotaExceededError that occurs when Chrome AI APIs (LanguageDetector, Summarizer) process pages with large amounts of text content by implementing text truncation in the getPageText() function.

  • Added MAX_TEXT_LENGTH constant set to 1000 characters
  • Implemented truncation logic to limit text before passing to Chrome AI APIs
  • Added logging message when text is truncated

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

LOG_PRE_FIX: 'ChromeAI-Rtd-Provider:',
STORAGE_KEY: 'chromeAi_detected_data', // Single key for both language and keywords
MIN_TEXT_LENGTH: 20,
MAX_TEXT_LENGTH: 1000, // Limit to prevent QuotaExceededError with Chrome AI APIs
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new MAX_TEXT_LENGTH constant lacks test coverage. The test file checks for MIN_TEXT_LENGTH at line 137 but doesn't verify that MAX_TEXT_LENGTH exists or that the truncation logic works correctly. Consider adding tests to verify: (1) MAX_TEXT_LENGTH constant exists, (2) text longer than MAX_TEXT_LENGTH gets truncated, and (3) truncated text is logged appropriately.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drpaulfarrow can you please look into adding a unit test that will cover this new use-case?

Comment on lines +99 to +100
logMessage(`${CONSTANTS.LOG_PRE_FIX} Truncating text from ${text.length} to ${CONSTANTS.MAX_TEXT_LENGTH} chars.`);
return text.substring(0, CONSTANTS.MAX_TEXT_LENGTH);
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truncating text using substring(0, MAX_TEXT_LENGTH) may cut words in half, potentially affecting language detection and summarization quality. Consider using a smarter truncation strategy that breaks at word boundaries, such as finding the last space before MAX_TEXT_LENGTH or using a method like lastIndexOf(' ') to ensure complete words are preserved.

Suggested change
logMessage(`${CONSTANTS.LOG_PRE_FIX} Truncating text from ${text.length} to ${CONSTANTS.MAX_TEXT_LENGTH} chars.`);
return text.substring(0, CONSTANTS.MAX_TEXT_LENGTH);
// Prefer truncating at a word boundary (whitespace) before MAX_TEXT_LENGTH
const hardLimit = CONSTANTS.MAX_TEXT_LENGTH;
const searchStart = hardLimit;
const lastSpace = text.lastIndexOf(' ', searchStart);
const lastNewline = text.lastIndexOf('\n', searchStart);
const lastTab = text.lastIndexOf('\t', searchStart);
const lastWhitespace = Math.max(lastSpace, lastNewline, lastTab);
const truncateAt = (lastWhitespace >= CONSTANTS.MIN_TEXT_LENGTH) ? lastWhitespace : hardLimit;
logMessage(`${CONSTANTS.LOG_PRE_FIX} Truncating text from ${text.length} to ${truncateAt} chars (max ${CONSTANTS.MAX_TEXT_LENGTH}).`);
return text.substring(0, truncateAt);

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drpaulfarrow is there any value to this logic from your perspective? If someone has reached the limit, does it matter if the word was cut-off nicely or not?

Copy link
Collaborator

@jsnellbaker jsnellbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drpaulfarrow did you have the chance to review my previous comment? Are you able to make some unit tests that would cover this change?

@patmmccann
Copy link
Collaborator

@pm-azhar-mulla could you add feedback?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants