Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions lib/decode_url.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
import { parse, format } from 'url';
import { format } from 'url';
import { unescape } from 'querystring';

const PROTOCOL_RE = /^[a-z0-9.+-]+:/i;

const hasProtocolLikeNode = (str: unknown): boolean => {
if (typeof str !== 'string') throw new TypeError('url must be a string');
return PROTOCOL_RE.test(str.trim());
};

const decodeURL = (str: string) => {
if (parse(str).protocol) {
if (hasProtocolLikeNode(str)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to implement some loose check, why not just if (str.includes('://')?

Copy link
Member Author

@D-Sketon D-Sketon Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, using str.includes('://') fails the unit tests.
{92E0351B-DE8D-4E55-87E9-7367053CFC09}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not a fan of copying some regexp. How would we know if it is spec-compliant or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regexp is from nodejs https://github.com/nodejs/node/blob/a1244f04dea9148c44fd6daf60b5105f7a85ea12/lib/url.js#L96, and it is stricter than the one in RFC 3986 (https://www.rfc-editor.org/rfc/rfc3986#appendix-B). The regex in RFC 3986 is /^(([^:/?#]+):)/.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But whathappened if Node.js changes their implementation (maybe a ReDoS attack is discovered, maybe a CSRF PoC is created, and there is a new CVE created), how can we make sure our copied regexp is up to date?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is such a concern, let's consider a less radical optimization method.

const parsed = new URL(str);

// Exit if input is a data url
Expand Down
11 changes: 9 additions & 2 deletions lib/encode_url.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
import { parse, format } from 'url';
import { format } from 'url';
import { unescape } from 'querystring';

const PROTOCOL_RE = /^[a-z0-9.+-]+:/i;

const hasProtocolLikeNode = (str: unknown): boolean => {
if (typeof str !== 'string') throw new TypeError('url must be a string');
return PROTOCOL_RE.test(str.trim());
};

const encodeURL = (str: string) => {
if (parse(str).protocol) {
if (hasProtocolLikeNode(str)) {
const parsed = new URL(str);

// Exit if input is a data url
Expand Down
Loading