Skip to content

Commit

Permalink
Speed up both encode() and decode() methods.
Browse files Browse the repository at this point in the history
  • Loading branch information
mdevils committed Mar 15, 2021
1 parent 50b4afa commit 2347178
Show file tree
Hide file tree
Showing 4 changed files with 110 additions and 53 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
2.1.1
-----

* Speed up both `encode()` and `decode()` methods.

2.1.0
-----

Expand Down
66 changes: 33 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,68 +97,68 @@ Common
Initialization / Load speed
* #1: html-entities x 2,992,640 ops/sec ±1.76% (82 runs sampled)
#2: entities x 2,093,859 ops/sec ±1.17% (82 runs sampled)
#3: he x 1,815,445 ops/sec ±1.30% (87 runs sampled)
* #1: html-entities x 2,941,745 ops/sec ±1.87% (81 runs sampled)
#2: entities x 2,061,661 ops/sec ±1.16% (82 runs sampled)
#3: he x 1,861,758 ops/sec ±1.15% (86 runs sampled)
HTML5
Encode test
* #1: html-entities.encode - html5, nonAsciiPrintable x 427,051 ops/sec ±0.25% (96 runs sampled)
* #2: html-entities.encode - html5, nonAscii x 427,332 ops/sec ±0.68% (96 runs sampled)
#3: entities.encodeNonAsciiHTML x 333,348 ops/sec ±1.08% (93 runs sampled)
* #4: html-entities.encode - html5, extensive x 269,630 ops/sec ±0.26% (98 runs sampled)
#5: entities.encodeHTML x 126,117 ops/sec ±0.27% (93 runs sampled)
#6: he.encode x 114,119 ops/sec ±0.20% (96 runs sampled)
* #1: html-entities.encode - html5, nonAscii x 439,350 ops/sec ±0.21% (96 runs sampled)
* #2: html-entities.encode - html5, nonAsciiPrintable x 410,462 ops/sec ±0.22% (93 runs sampled)
#3: entities.encodeNonAsciiHTML x 332,966 ops/sec ±0.54% (92 runs sampled)
* #4: html-entities.encode - html5, extensive x 280,865 ops/sec ±0.22% (95 runs sampled)
#5: entities.encodeHTML x 125,338 ops/sec ±0.30% (92 runs sampled)
#6: he.encode x 112,572 ops/sec ±0.25% (97 runs sampled)
Decode test
* #1: html-entities.decode - html5, strict x 347,055 ops/sec ±0.27% (94 runs sampled)
* #2: html-entities.decode - html5, attribute x 340,751 ops/sec ±0.22% (97 runs sampled)
* #3: html-entities.decode - html5, body x 333,538 ops/sec ±0.28% (94 runs sampled)
#4: entities.decodeHTMLStrict x 329,206 ops/sec ±1.64% (92 runs sampled)
#5: entities.decodeHTML x 278,862 ops/sec ±0.24% (97 runs sampled)
#6: he.decode x 185,834 ops/sec ±0.23% (96 runs sampled)
* #1: html-entities.decode - html5, body x 428,051 ops/sec ±0.22% (98 runs sampled)
* #2: html-entities.decode - html5, strict x 402,821 ops/sec ±0.22% (91 runs sampled)
* #3: html-entities.decode - html5, attribute x 391,007 ops/sec ±0.33% (90 runs sampled)
#4: entities.decodeHTMLStrict x 332,909 ops/sec ±0.56% (95 runs sampled)
#5: entities.decodeHTML x 274,700 ops/sec ±0.29% (97 runs sampled)
#6: he.decode x 184,440 ops/sec ±0.27% (95 runs sampled)
HTML4
Encode test
* #1: html-entities.encode - html4, nonAscii x 413,667 ops/sec ±0.51% (94 runs sampled)
* #2: html-entities.encode - html4, nonAsciiPrintable x 390,540 ops/sec ±0.39% (95 runs sampled)
* #3: html-entities.encode - html4, extensive x 199,258 ops/sec ±0.20% (97 runs sampled)
* #1: html-entities.encode - html4, nonAscii x 419,600 ops/sec ±0.65% (94 runs sampled)
* #2: html-entities.encode - html4, nonAsciiPrintable x 413,954 ops/sec ±0.83% (91 runs sampled)
* #3: html-entities.encode - html4, extensive x 216,838 ops/sec ±0.22% (96 runs sampled)
Decode test
* #1: html-entities.decode - html4, strict x 369,977 ops/sec ±1.13% (93 runs sampled)
* #2: html-entities.decode - html4, body x 366,084 ops/sec ±0.30% (94 runs sampled)
* #3: html-entities.decode - html4, attribute x 363,317 ops/sec ±0.33% (94 runs sampled)
* #1: html-entities.decode - html4, strict x 420,850 ops/sec ±0.23% (92 runs sampled)
* #2: html-entities.decode - html4, body x 413,042 ops/sec ±0.49% (94 runs sampled)
* #3: html-entities.decode - html4, attribute x 408,538 ops/sec ±2.59% (92 runs sampled)
XML
Encode test
* #1: html-entities.encode - xml, nonAscii x 478,394 ops/sec ±2.54% (92 runs sampled)
* #2: html-entities.encode - xml, nonAsciiPrintable x 459,013 ops/sec ±0.20% (97 runs sampled)
#3: entities.encodeXML x 352,570 ops/sec ±1.05% (93 runs sampled)
* #4: html-entities.encode - xml, extensive x 269,313 ops/sec ±0.24% (92 runs sampled)
* #1: html-entities.encode - xml, nonAscii x 511,788 ops/sec ±0.21% (97 runs sampled)
* #2: html-entities.encode - xml, nonAsciiPrintable x 482,136 ops/sec ±0.40% (93 runs sampled)
#3: entities.encodeXML x 353,189 ops/sec ±0.57% (95 runs sampled)
* #4: html-entities.encode - xml, extensive x 291,091 ops/sec ±0.23% (96 runs sampled)
Decode test
* #1: html-entities.decode - xml, body x 429,601 ops/sec ±0.20% (96 runs sampled)
* #2: html-entities.decode - xml, strict x 428,820 ops/sec ±0.22% (96 runs sampled)
#3: entities.decodeXML x 423,011 ops/sec ±0.28% (94 runs sampled)
* #4: html-entities.decode - xml, attribute x 419,337 ops/sec ±0.66% (94 runs sampled)
* #1: html-entities.decode - xml, body x 543,327 ops/sec ±0.25% (89 runs sampled)
* #2: html-entities.decode - xml, attribute x 533,470 ops/sec ±0.22% (94 runs sampled)
* #3: html-entities.decode - xml, strict x 528,014 ops/sec ±2.27% (95 runs sampled)
#4: entities.decodeXML x 421,154 ops/sec ±0.32% (96 runs sampled)
Escaping
Escape test
#1: he.escape x 1,126,149 ops/sec ±0.23% (98 runs sampled)
* #2: html-entities.encode - xml, specialChars x 1,077,095 ops/sec ±1.09% (94 runs sampled)
#3: entities.escapeUTF8 x 724,973 ops/sec ±0.25% (98 runs sampled)
#4: entities.escape x 316,363 ops/sec ±0.20% (97 runs sampled)
* #1: html-entities.encode - xml, specialChars x 1,583,074 ops/sec ±0.24% (95 runs sampled)
#2: he.escape x 1,131,879 ops/sec ±1.65% (94 runs sampled)
#3: entities.escapeUTF8 x 736,205 ops/sec ±0.28% (94 runs sampled)
#4: entities.escape x 314,225 ops/sec ±0.24% (93 runs sampled)
```

License
Expand Down
89 changes: 69 additions & 20 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,17 +46,42 @@ export function encode(
if (!text) {
return '';
}

const encodeRegExp = encodeRegExps[mode];
encodeRegExp.lastIndex = 0;

let match = encodeRegExp.exec(text);

if (!match) {
return text;
}

const references = allNamedReferences[level].characters;
const isHex = numeric === 'hexadecimal';

return text.replace(encodeRegExps[mode], function (input) {
let lastIndex = 0;
let result = '';

do {
if (lastIndex !== match.index) {
result += text.substring(lastIndex, match.index);
}
const input = match[0];
const entity = references[input];
if (entity) {
return entity;
result += entity;
} else {
const code = input.length > 1 ? getCodePoint(input, 0)! : input.charCodeAt(0);
result += (isHex ? '&#x' + code.toString(16) : '&#' + code) + ';';
}
const code = input.length > 1 ? getCodePoint(input, 0)! : input.charCodeAt(0);
return (isHex ? '&#x' + code.toString(16) : '&#' + code) + ';';
});
lastIndex = match.index + input.length;
} while ((match = encodeRegExp.exec(text)));

if (lastIndex !== text.length) {
result += text.substring(lastIndex, text.length);
}

return result;
}

const defaultDecodeOptions: DecodeOptions = {
Expand Down Expand Up @@ -100,24 +125,48 @@ export function decode(
if (!text) {
return '';
}
const decodeRegExp = decodeRegExps[level][scope];

let match = decodeRegExp.exec(text);

if (!match) {
return text;
}

const references = allNamedReferences[level].entities;
const isAttribute = scope === 'attribute';

return text.replace(decodeRegExps[level][scope], function (entity) {
if (isAttribute && entity[entity.length - 1] === '=') {
return entity;
let lastIndex = 0;
let result = '';

do {
const entity = match[0];
if (lastIndex !== match.index) {
result += text.substring(lastIndex, match.index);
}
if (entity[1] != '#') {
return references[entity] || entity;
if (isAttribute && entity[entity.length - 1] === '=') {
result += entity;
} else if (entity[1] != '#') {
result += references[entity] || entity;
} else {
const secondChar = entity[2];
const code =
secondChar == 'x' || secondChar == 'X' ? parseInt(entity.substr(3), 16) : parseInt(entity.substr(2));

result +=
code >= 0x10ffff
? outOfBoundsChar
: code > 65535
? fromCodePoint(code)
: fromCharCode(numericUnicodeMap[code] || code);
}
const secondChar = entity[2];
const code =
secondChar == 'x' || secondChar == 'X' ? parseInt(entity.substr(3), 16) : parseInt(entity.substr(2));

return code >= 0x10ffff
? outOfBoundsChar
: code > 65535
? fromCodePoint(code)
: fromCharCode(numericUnicodeMap[code] || code);
});

lastIndex = match.index + entity.length;
} while ((match = decodeRegExp.exec(text)));

if (lastIndex !== text.length) {
result += text.substring(lastIndex, text.length);
}

return result;
}
3 changes: 3 additions & 0 deletions test/index.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ describe('encode()', () => {
expect(encode('a\n<>"\'&©∆℞😂\0\x01', {mode: 'specialChars'})).to.equal(
'a\n&lt;&gt;&quot;&apos;&amp;©∆℞😂\0\x01'
);
expect(encode('a\n<>"\'&©∆℞😂\0\x01END', {mode: 'specialChars'})).to.equal(
'a\n&lt;&gt;&quot;&apos;&amp;©∆℞😂\0\x01END'
);
expect(encode('a\n<>"\'&©∆℞😂\0\x01', {mode: 'nonAscii'})).to.equal(
'a\n&lt;&gt;&quot;&apos;&amp;&copy;&#8710;&rx;&#128514;\0\x01'
);
Expand Down

0 comments on commit 2347178

Please sign in to comment.