Chrome AI RTD Provider: fix QuotaExceededError with large page content#14295
Chrome AI RTD Provider: fix QuotaExceededError with large page content#14295drpaulfarrow wants to merge 3 commits intoprebid:masterfrom
Conversation
Added MAX_TEXT_LENGTH constant (1000 chars) and text truncation logic in getPageText() to prevent QuotaExceededError when Chrome AI APIs process pages with large amounts of text content. Without this fix, pages with extensive text content can cause the Chrome AI APIs (LanguageDetector, Summarizer) to throw QuotaExceededError exceptions.
There was a problem hiding this comment.
Pull request overview
This PR fixes a QuotaExceededError that occurs when Chrome AI APIs (LanguageDetector, Summarizer) process pages with large amounts of text content by implementing text truncation in the getPageText() function.
- Added MAX_TEXT_LENGTH constant set to 1000 characters
- Implemented truncation logic to limit text before passing to Chrome AI APIs
- Added logging message when text is truncated
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| LOG_PRE_FIX: 'ChromeAI-Rtd-Provider:', | ||
| STORAGE_KEY: 'chromeAi_detected_data', // Single key for both language and keywords | ||
| MIN_TEXT_LENGTH: 20, | ||
| MAX_TEXT_LENGTH: 1000, // Limit to prevent QuotaExceededError with Chrome AI APIs |
There was a problem hiding this comment.
The new MAX_TEXT_LENGTH constant lacks test coverage. The test file checks for MIN_TEXT_LENGTH at line 137 but doesn't verify that MAX_TEXT_LENGTH exists or that the truncation logic works correctly. Consider adding tests to verify: (1) MAX_TEXT_LENGTH constant exists, (2) text longer than MAX_TEXT_LENGTH gets truncated, and (3) truncated text is logged appropriately.
There was a problem hiding this comment.
@drpaulfarrow can you please look into adding a unit test that will cover this new use-case?
| logMessage(`${CONSTANTS.LOG_PRE_FIX} Truncating text from ${text.length} to ${CONSTANTS.MAX_TEXT_LENGTH} chars.`); | ||
| return text.substring(0, CONSTANTS.MAX_TEXT_LENGTH); |
There was a problem hiding this comment.
Truncating text using substring(0, MAX_TEXT_LENGTH) may cut words in half, potentially affecting language detection and summarization quality. Consider using a smarter truncation strategy that breaks at word boundaries, such as finding the last space before MAX_TEXT_LENGTH or using a method like lastIndexOf(' ') to ensure complete words are preserved.
| logMessage(`${CONSTANTS.LOG_PRE_FIX} Truncating text from ${text.length} to ${CONSTANTS.MAX_TEXT_LENGTH} chars.`); | |
| return text.substring(0, CONSTANTS.MAX_TEXT_LENGTH); | |
| // Prefer truncating at a word boundary (whitespace) before MAX_TEXT_LENGTH | |
| const hardLimit = CONSTANTS.MAX_TEXT_LENGTH; | |
| const searchStart = hardLimit; | |
| const lastSpace = text.lastIndexOf(' ', searchStart); | |
| const lastNewline = text.lastIndexOf('\n', searchStart); | |
| const lastTab = text.lastIndexOf('\t', searchStart); | |
| const lastWhitespace = Math.max(lastSpace, lastNewline, lastTab); | |
| const truncateAt = (lastWhitespace >= CONSTANTS.MIN_TEXT_LENGTH) ? lastWhitespace : hardLimit; | |
| logMessage(`${CONSTANTS.LOG_PRE_FIX} Truncating text from ${text.length} to ${truncateAt} chars (max ${CONSTANTS.MAX_TEXT_LENGTH}).`); | |
| return text.substring(0, truncateAt); |
There was a problem hiding this comment.
@drpaulfarrow is there any value to this logic from your perspective? If someone has reached the limit, does it matter if the word was cut-off nicely or not?
jsnellbaker
left a comment
There was a problem hiding this comment.
@drpaulfarrow did you have the chance to review my previous comment? Are you able to make some unit tests that would cover this change?
|
@pm-azhar-mulla could you add feedback? |
Type of change
[x] Bugfix
Description of change
Added MAX_TEXT_LENGTH constant (1000 chars) and text truncation logic in getPageText() to prevent QuotaExceededError when Chrome AI APIs process pages with large amounts of text content.
Root cause: Chrome's built-in AI APIs (LanguageDetector, Summarizer) have internal quotas on input size. When document.body.textContent exceeds these limits, the APIs throw QuotaExceededError exceptions, causing the RTD submodule to fail silently.
Changes:
Added MAX_TEXT_LENGTH: 1000 to CONSTANTS object
Updated getPageText() to truncate text exceeding the limit before passing to Chrome AI APIs
Testing:
gulp lint passes
gulp test --file test/spec/modules/chromeAiRtdProvider_spec.js passes
Manually verified on pages with 50k+ character content
Other information
This is a minimal, focused fix - only 6 lines added. The 1000 character limit provides sufficient text for accurate language detection and keyword summarization while staying well within Chrome AI API quotas.