Skip to content

Conversation

@shaunyogeshwaran
Copy link
Member

@shaunyogeshwaran shaunyogeshwaran commented May 26, 2025

Fixes #2466
This PR is still a WIP and not yet ready for review

The PR fulfills these requirements: (check all the apply)

  • It's submitted to the main branch.
  • When resolving a specific issue, it's referenced in the PR's title (e.g. feat: Add a button #xxx, where "xxx" is the issue number).
  • When resolving a specific issue, the PR description includes Closes #xxx, where "xxx" is the issue number.
  • If changes were made to ui folder, unit tests (make test) still pass.
  • New/updated tests are included

@shaunyogeshwaran shaunyogeshwaran self-assigned this May 26, 2025
@shaunyogeshwaran shaunyogeshwaran added the feature Feature request label May 26, 2025
@shaunyogeshwaran shaunyogeshwaran marked this pull request as draft May 26, 2025 07:37

while (nextPageUrl) {
if (urls.length >= maxPages) break;
if (urls.includes(nextPageUrl)) break;

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

'
https://docs.h2o.ai/h2o-document-ai/get-started/what-is-h2o-document-ai
' can be anywhere in the URL, and arbitrary hosts may come before or after it.

Copilot Autofix

AI 6 months ago

To address the issue, we need to validate that nextPageUrl belongs to the trusted domain (docs.h2o.ai) before adding it to the urls array. This can be achieved by parsing the URL using the URL constructor and checking its hostname property against the trusted domain. This approach ensures that only URLs with the correct host are processed, mitigating the risk of malicious URLs bypassing the check.

The changes will involve:

  1. Parsing nextPageUrl using the URL constructor.
  2. Validating that the hostname of nextPageUrl matches the trusted domain (docs.h2o.ai).
  3. Updating the condition on line 16 to include this validation.

Suggested changeset 1
docs-pdf/generate-pdf.js

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/docs-pdf/generate-pdf.js b/docs-pdf/generate-pdf.js
--- a/docs-pdf/generate-pdf.js
+++ b/docs-pdf/generate-pdf.js
@@ -15,3 +15,9 @@
     if (urls.length >= maxPages) break;
-    if (urls.includes(nextPageUrl)) break;
+    try {
+      const parsedUrl = new URL(nextPageUrl);
+      if (parsedUrl.hostname !== 'docs.h2o.ai' || urls.includes(nextPageUrl)) break;
+    } catch (e) {
+      console.error(`Invalid URL encountered: ${nextPageUrl}`);
+      break;
+    }
 
EOF
@@ -15,3 +15,9 @@
if (urls.length >= maxPages) break;
if (urls.includes(nextPageUrl)) break;
try {
const parsedUrl = new URL(nextPageUrl);
if (parsedUrl.hostname !== 'docs.h2o.ai' || urls.includes(nextPageUrl)) break;
} catch (e) {
console.error(`Invalid URL encountered: ${nextPageUrl}`);
break;
}

Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Feature request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a PDF conversion script

1 participant